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Preface 


Why have we written this book? In recent decades the field of financial risk man- 
agement has undergone explosive development. This book is devoted specifically to 
quantitative modelling issues arising in this field. As a result of our own discussions 
and joint projects with industry professionals and regulators over a number of years, 
we felt there was a need for a textbook treatment of quantitative risk management 
(QRM) at a technical yet accessible level, aimed at both industry participants and 
students seeking an entrance to the area. 

We have tried to bring together a body of methodology that we consider to be core 
material for any course on the subject. This material and its mode of presentation 
represent the blending of our own views, which come from the perspectives of 
financial mathematics, insurance mathematics and statistics. We feel that a book 
combining these viewpoints fills a gap in the existing literature and partly anticipates 
the future need for quantitative risk managers in banks, insurance companies and 
beyond with broad, interdisciplinary skills. 


Who was this book written for? This book is primarily a textbook for courses 
on QRM aimed at advanced undergraduate or graduate students and professionals 
from the financial industry. A knowledge of probability and statistics at least at the 
level of a first university course in a quantitative discipline and familiarity with 
undergraduate calculus and linear algebra are fundamental prerequisites. Though 
not absolutely necessary, some prior exposure to finance, economics or insurance 
will be beneficial for a better understanding of some sections. 

The book has a secondary function as a reference text for risk professionals inter- 
ested in a clear and concise treatment of concepts and techniques used in practice. 
As such, we hope it will facilitate communication between regulators, end-users and 
academics. 

A third audience for the book is the growing community of researchers working in 
the area. Most chapters take the reader to the frontier of current, practically relevant 
research and contain extensive, annotated references that guide the reader through 
the burgeoning literature. 


Ways to use this book. Based on our experience of teaching university courses 
on QRM at ETH Zurich, the Universities of Zurich and Leipzig and the London 
School of Economics, a two-semester course of 3—4 hours a week can be based on 
material in Chapters 2-8 and parts of Chapter 10; Chapter 1 is typically given as 
background reading material. Chapter 9 is a more technically demanding chapter 
that has been included because of the current interest in quantitative methods for 
pricing and hedging credit derivatives; it is primarily intended for more advanced, 
specialized courses on credit risk (see below). 
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A course on market risk can be based on a fairly complete treatment of 
Chapters 2—4, with excursions into material in Chapters 5, 6 and 7 (normal mixture 
copulas, coherent risk measures, extreme value methods for threshold exceedances) 
as time permits. 

A course on credit risk can be based on Chapters 8 and 9 but requires a preliminary 
treatment of some topics in earlier chapters. Sections 2.1 and 2.2 give the necessary 
grounding in basic concepts; Sections 3.1, 3.2, 3.4, 5.1 and 5.4 are necessary for 
an understanding of multivariate models of portfolio credit risk; and Sections 6.1 
and 6.3 are required to understand how capital is allocated to credit risks. 

A short course or seminar on operational risk could be based on Chapter 10, 
but would also benefit from some supplementary material from other chapters; 
Sections 2.1 and 2.2 and Chapters 6 and 7 are particularly relevant. 

It is also possible to devise more specialized courses, such as a course on risk- 
measurement and aggregation concepts based on Chapters 2, 5 and 6, or a course on 
risk-management techniques for financial econometricians based on Chapters 2—4 
and 7. Material from various chapters could be used as interesting examples to 
enliven statistics courses on subjects like multivariate analysis, time series analysis 
and generalized linear modelling. 


What we have not covered. We have not been able to address all topics that a reader 
might expect to find under the heading of QRM. Perhaps the most obvious omission 
is the lack of a section on the risk management of derivatives by hedging. We felt here 
that the relevant techniques, and the financial mathematics required to understand 
them, are already well covered in a number of excellent textbooks. Other omissions 
include RAROC (risk-adjusted return on capital) and performance-measurement 
issues. Besides these larger areas, many smaller issues have been neglected for 
reasons of space, but are mentioned with suggestions for further reading in the 
“Notes and Comments” sections, which should be considered as integral parts of 
the text. 


Acknowledgements. The origins of this book date back to 1996, when A.M. and 
R.F. began postdoctoral studies in the group of P.E. at the Federal Institute of Tech- 
nology (ETH) in Zurich. All three authors are grateful to ETH for providing the 
environment in which the project flourished. A.M. and R.F. thank Swiss Re and 
UBS, respectively, for providing the financial support for their postdoctoral posi- 
tions. R.F. has subsequently held positions at the Swiss Banking Institute of the 
University of Zurich and at the University of Leipzig and is grateful to both institu- 
tions for their support. 

The Forschungsinstitut für Mathematik (FIM) of the ETH Zurich provided finan- 
cial support at various stages of the project. At a crucial juncture in early 2004 
the Mathematisches Foschungsinstitut Oberwolfach was the venue for a memorable 
week of progress. P.E. recalls fondly his time as Centennial Professor of Finance at 
the London School of Economics; numerous discussions with colleagues from the 
Department of Accounting and Finance helped in shaping his view of the importance 
of QRM. We also acknowledge the invaluable contribution of RiskLab Zurich to the 
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enterprise: the agenda for the book was strongly influenced by joint projects and 
discussions with the RiskLab sponsors UBS, Credit Suisse and Swiss Re. We have 
also benefited greatly from the NCCR FINRISK research program in Switzerland, 
which funded doctoral and postdoctoral research on topics in the book. 

We are indebted to numerous proof-readers who have commented on various 
parts of the manuscript, and to colleagues in Zurich, Leipzig and beyond who 
have helped us in our understanding of QRM and the mathematics underlying it. 
These include Stefan Altner, Philippe Artzner, Jochen Backhaus, Guus Balkema, Uta 
Beckmann, Reto Baumgartner, Wolfgang Breymann, Reto Bucher, Hans Buhlmann, 
Peter Buhlmann, Valérie Chavez-Demoulin, Dominik Colangelo, Freddy Delbaen, 
Rosario Dell’ Aquila, Stefan Denzler, Alexandra Dias, Stefano Demarta, Damir 
Filipovic, Gabriel Frahm, Hansjorg Furrer, Rajna Gibson, Kay Giesecke, Enrico 
De Giorgi, Bernhard Hodler, Andrea Hoing, Christoph Hummel, Alessandro Juri, 
Roger Kaufmann, Philipp Keller, Hans Rudolf Kunsch, Filip Lindskog, Hans-Jakob 
Lüthi, Natalia Markovich, Benoit Metayer, Johanna NeSlehova, Monika Popp, 
Giovanni Puccetti, Hanspeter Schmidli, Sylvia Schmidt, Thorsten Schmidt, Uwe 
Schmock, Philipp Schonbucher, Martin Schweizer, Torsten Steiger, Daniel Strau- 
mann, Dirk Tasche, Eduardo Vilela, Marcel Visser and Jonathan Wendin. For her 
help in preparing the manuscript we thank Gabriele Baltes. 

We thank Richard Baggaley and the team at Princeton University Press for all 
their help in the production of this book. We are also grateful to our anonymous 
referees who provided us with exemplary feedback, which has shaped this book for 
the better. Special thanks go to Sam Clark at TgT Productions Ltd, who took our 
uneven IEX code and turned it into a more polished book with remarkable speed 
and efficiency. 

To our wives, Janine, Catharina and Gerda, and our families our sincerest debt of 
gratitude is due. Though driven to distraction no doubt by our long contemplation 
of risk, without obvious reward, their support was constant. 


Further resources. Readers are encouraged to visit the book’s homepage at 
www.pupress.princeton.edu/titles/8056.html 


to find supplementary resources for this book. Our intention is to make available the 
computer code (mostly S-PLUS) used to generate the examples in this book, and to 
list errata. 


Special abbreviations. A number of abbreviations for common terms in probability 
are used throughout the book; these include “rv” for random variable, “df” for 
distribution function, “iid” for independent and identically distributed and “se” for 
standard error. 
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Risk in Perspective 


In this chapter we provide a non-mathematical discussion of various issues that form 
the background to the rest of the book. In Section 1.1 we begin with the nature of risk 
itself and how risk relates to randomness; in the financial context (which includes 
insurance) we summarize the main kinds of risks encountered and explain what it 
means to measure and manage such risks. 

A brief history of financial risk management, or at least some of the main ideas 
that are used in modern practice, is given in Section 1.2, including a summary of the 
process leading to the Basel Accords. Section 1.3 gives an idea of the new regulatory 
framework that is emerging in the financial and insurance industries. 

In Section 1.4 we take a step back and attempt to address the fundamental question 
of why we might want to measure and manage risk at all. Finally, in Section 1.5, we 
turn explicitly to quantitative risk management (QRM) and set out our own views 
concerning the nature of this discipline and the challenge it poses. This section in 
particular should give more insight into why we have chosen to address the particular 
methodological topics in this book. 


1.1 Risk 


The Concise Oxford English Dictionary defines risk as “hazard, a chance of bad 
consequences, loss or exposure to mischance”. In a discussion with students tak- 
ing a course on financial risk management, ingredients which typically enter are 
events, decisions, consequences and uncertainty. Mostly only the downside of risk 
is mentioned, rarely a possible upside, i.e. the potential for a gain. For financial 
risks, the subject of this book, we might arrive at a definition such as “any event or 
action that may adversely affect an organization’s ability to achieve its objectives 
and execute its strategies” or, alternatively, “the quantifiable likelihood of loss or 
less-than-expected returns”. But while these capture some of the elements of risk, 
no single one-sentence definition is entirely satisfactory in all contexts. 


1.1.1 Risk and Randomness 


Independently of any context, risk relates strongly to uncertainty, and hence to the 
notion of randomness. Randomness has eluded a clear, workable definition for many 
centuries; it was not until 1933 that the Russian mathematician A. N. Kolmogorov 
gave an axiomatic definition of randomness and probability (see Kolmogorov 1933). 
This definition and its accompanying theory, though not without their controversial 
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aspects, now provide the lingua franca for discourses on risk and uncertainty, such 
as this book. 

In Kolmogorov’s language a probabilistic model is described by a triplet 
(2, F, P). An element w of 2 represents a realization of an experiment, in eco- 
nomics often referred to as a state of nature. The statement “the probability that 
an event A occurs” is denoted (and in Kolmogorov’s axiomatic system defined) 
as P(A), where A is an element of F, the set of all events. P denotes the prob- 
ability measure. For the less mathematically trained reader it suffices to accept 
that Kolmogorov’s system translates our intuition about randomness into a concise, 
axiomatic language and clear rules. 

Consider the following examples: an investor who holds stock in a particular 
company; an insurance company that has sold an insurance policy; an individual 
who decides to convert a fixed-rate mortgage into a variable one. All of these sit- 
uations have something important in common: the investor holds today an asset 
with an uncertain future value. This is very clear in the case of the stock. For the 
insurance company, the policy sold may or may not be triggered by the underly- 
ing event covered. In the case of a mortgage, our decision today to enter into this 
refinancing agreement will change (for better or for worse) the future repayments. 
So randomness plays a crucial role in the valuation of current products held by the 
investor, the insurance company or the home owner. 

To model these situations a mathematician would now define a one-period risky 
position (or simply risk) X to be a function on the probability space (2, F, P); 
this function is called a random variable. We leave for the moment the range of X 
(i.e. its possible values) unspecified. Most of the modelling of a risky position X 
concerns its distribution function Fy(x) = P(X < x), the probability that by the 
end of the period under consideration, the value of the risk X is less than or equal 
to a given number x. Several risky positions would then be denoted by a random 
vector (X1,..., Xa), also written in bold face as X; time can be introduced, leading 
to the notion of random (or so-called stochastic) processes, usually written (X;). 
Throughout this book we will encounter many such processes, which serve as essen- 
tial building blocks in the mathematical description of risk. 

We therefore expect the reader to be at ease with basic notation, terminology and 
results from elementary probability and statistics, the branch of mathematics dealing 
with stochastic models and their application to the real world. The word “stochastic” 
is derived from the Greek “Stochazesthai’, the art of guessing, or “Stochastikos”, 
meaning skilled at aiming, “stochos” being a target. In discussing stochastic methods 
forrisk management we hope to emphasize the skill aspect rather than the guesswork. 


1.1.2 Financial Risk 


In this book we discuss risk in the context of finance and insurance (although many 
of the tools introduced are applicable well beyond this context). We start by giving 
a brief overview of the main risk types encountered in the financial industry. 

In banking, the best known type of risk is probably market risk, the risk of a change 
in the value of a financial position due to changes in the value of the underlying 
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components on which that position depends, such as stock and bond prices, exchange 
rates, commodity prices, etc. The next important category is credit risk, the risk of 
not receiving promised repayments on outstanding investments such as loans and 
bonds, because of the “default” of the borrower. A further risk category that has 
received a lot of recent attention is operational risk, the risk of losses resulting from 
inadequate or failed internal processes, people and systems, or from external events. 

The boundaries of these three risk categories are not always clearly defined, nor 
do they form an exhaustive list of the full range of possible risks affecting a finan- 
cial institution. There are notions of risk which surface in nearly all categories 
such as liquidity and model risk. The latter is the risk associated with using a mis- 
specified (inappropriate) model for measuring risk. Think, for instance, of using the 
Black-Scholes model for pricing an exotic option in circumstances where the basic 
Black-Scholes model assumptions on the underlying securities (such as the assump- 
tion of normally distributed returns) are violated. It may be argued that model risk 
is always present to some degree. Liquidity risk could be roughly defined as the risk 
stemming from the lack of marketability of an investment that cannot be bought or 
sold quickly enough to prevent or minimize a loss. Liquidity can be thought of as 
“oxygen for a healthy market”; we need it to survive but most of the time we are 
not aware of its presence. Its absence, however, is mostly recognized immediately, 
with often disastrous consequences. 

The concepts, techniques and tools we will introduce in the following chapters 
mainly apply to the three basic categories of market, credit and operational risk. We 
should stress that the only viable way forward for a successful handling of financial 
risk consists of a holistic approach, i.e. an integrated approach taking all types of 
risk and their interactions into account. Whereas this is a clear goal, current models 
do not yet allow for a fully satisfactory platform. 

As well as banks, the insurance industry has a long-standing relationship with 
risk. It is no coincidence that the Institute of Actuaries and the Faculty of Actuaries 
use the following definition of the actuarial profession. 


Actuaries are respected professionals whose innovative approach to 
making business successful is matched by a responsibility to the public 
interest. Actuaries identify solutions to financial problems. They man- 
age assets and liabilities by analysing past events, assessing the present 
risk involved and modelling what could happen in the future. 


An additional risk category entering through insurance is underwriting risk, the 
risk inherent in insurance policies sold. Examples of risk factors that play a role 
here are changing patterns of natural catastrophes, changes in demographic tables 
underlying (long-dated) life products, or changing customer behaviour (such as 
prepayment patterns). 


1.1.3 Measurement and Management 


Much of this book is concerned with techniques for the measurement of risk, an 
activity which is part of the process of managing risk, as we attempt to clarify in 
this section. 


4 1. Risk in Perspective 


Risk measurement. Suppose we hold a portfolio consisting of d underlying invest- 
ments with respective weights w1, ..., wq so that the change in value of the portfolio 
over a given holding period (the so-called P&L, or profit and loss) can be written as 
X= Se 1 Wi Xi, where X; denotes the change in value of the ith investment. Mea- 
suring the risk of this portfolio essentially consists of determining its distribution 
function Fy (x) = P(X < x), or functionals describing this distribution function 
such as its mean, variance or 99th percentile. 

In order to achieve this, we need a properly calibrated joint model for the under- 
lying random vector of investments (X1, ..., Xa). We will consider this problem in 
more detail in Chapter 2. At this point it suffices to understand that risk measurement 
is essentially a statistical issue; based on historical observations and given a specific 
model, a statistical estimate of the distribution of the change in value of a position, 
or one of its functionals, is calculated. As we shall see later, and this is indeed a main 
theme throughout the book, this is by no means an easy task with a unique solution. 

It should be clear from the outset that good risk measurement is a must. Increas- 
ingly, banking clients demand objective and detailed information on products bought 
and banks can face legal action when this information is found wanting. For any 
product sold, a proper quantification of the underlying risks needs to be explicitly 
made, allowing the client to decide whether or not the product on offer corresponds 
to his or her risk appetite. 


Risk management. Ina very general answer to the question of what risk manage- 
ment is about, Kloman (1990) writes that: 


To many analysts, politicians, and academics it is the management of 
environmental and nuclear risks, those technology-generated macro- 
risks that appear to threaten our existence. To bankers and financial 
officers it is the sophisticated use of such techniques as currency hedging 
and interest-rate swaps. To insurance buyers or sellers it is coordination 
of insurable risks and the reduction of insurance costs. To hospital 
administrators it may mean “quality assurance”. To safety professionals 
it is reducing accidents and injuries. In summary, risk management is 
a discipline for living with the possibility that future events may cause 
adverse effects. 


The last phrase in particular (the italics are ours) captures the general essence of 
risk management, although for a financial institution one can perhaps go further. A 
bank’s attitude to risk is not passive and defensive; a bank actively and willingly 
takes on risk, because it seeks a return and this does not come without risk. Indeed 
risk management can be seen as the core competence of an insurance company 
or a bank. By using its expertise, market position and capital structure, a financial 
institution can manage risks by repackaging them and transferring them to markets 
in customized ways. 

Managing the risk is thus related to preserving the flow of profit and to techniques 
like asset liability management (ALM), which might be defined as managing a finan- 
cial institution so as to earn an adequate return on funds invested, and to maintain 
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a comfortable surplus of assets beyond liabilities. In Section 1.4 we discuss these 
corporate finance issues in more depth from a shareholder’s point of view. 


1.2 A Brief History of Risk Management 


In this section we treat the historical development of risk management by sketching 
some of the innovations and some of the events that have shaped modern risk man- 
agement for the financial industry. We also describe the more recent development 
of regulation in that industry, which has to some extent been prompted by a number 
of recent disasters. 


1.2.1 From Babylon to Wall Street 


Although risk management has been described as “one of the most important inno- 
vations of the 20th century” by Steinherr (1998) and most of the story we tell is 
relatively modern, some concepts that are used in modern risk management, in par- 
ticular derivatives, have been around for longer. In our discussion we stress the 
example of financial derivatives, as these brought the need for increased banking 
regulation very much to the fore. 


The ancient world to the twentieth century. A derivative is a financial instrument 
derived from an underlying asset, such as an option, future or swap. For example, 
a European call option with strike K and maturity T gives the holder the right, but 
not the obligation, to obtain from the seller at maturity the underlying security for 
a price of K; a European put option gives the holder the right to dispose of the 
underlying at a price K. 

Dunbar (2000) interprets a passage in the Code of Hammurabi from Babylon 
of 1800 BC as being early evidence of the use of the option concept to provide 
financial cover in the event of crop failure. A very explicit mention of options 
appears in Amsterdam towards the end of the seventeenth century and is beautifully 
narrated by Joseph de la Vega in his 1688 Confusión de Confusiones, a discussion 
between a lawyer, a trader and a philosopher observing the activity on the Beurs 
of Amsterdam. Their discussion contains what we now recognize as European call 
and put options, and a description of their use for investment as well as for risk 
management, and even the notion of short selling. In an excellent recent translation 
(de la Vega 1966) we read: 


If I may explain “opsies” [further, I would say that] through the payment 
of the premiums, one hands over values in order to safeguard one’s stock 
or to obtain a profit. One uses them as sails for a happy voyage during 
a beneficent conjuncture and as an anchor of security in a storm. 


After this, de la Vega continues with some explicit examples that would not be out 
of place in any modern finance course on the topic. 

Financial derivatives in general, and options in particular, are not so new. More- 
over, they appear here as instruments to manage risk, “anchors of security in a 
storm”, rather than the inventions of the capitalist devil, the “wild beasts of finance” 
(Steinherr 1998), that many now believe them to be. 
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Academic innovation in the twentieth century. While the use of risk-management 
ideas such as derivatives can be traced further back, it was not until the late twentieth 
century that a theory of valuation for derivatives was developed. This can be seen 
as perhaps the most important milestone in an age of academic developments in the 
general area of quantifying and managing financial risk. 

Before the 1950s the desirability of an investment was mainly equated to its return. 
In his ground-breaking publication of 1952, Harry Markowitz laid the foundation 
of the theory of portfolio selection by mapping the desirability of an investment 
onto a risk—return diagram, where risk was measured using standard deviation (see 
Markowitz 1952, 1959). Through the notion of an efficient frontier the portfolio 
manager could optimize the return for a given risk level. The following decades saw 
an explosive growth in risk-management methodology, including such ideas as the 
Sharpe ratio, the Capital Asset Pricing Model (CAPM) and Arbitrage Pricing Theory 
(APT). Numerous extensions and refinements followed, which are now taught in 
any MBA course on finance. 

The famous Black—Scholes—Merton formula for the price of a European call 
option appeared in 1973 (see Black and Scholes 1973). The importance of this 
formula was underscored in 1997, when the Bank of Sweden Prize in Economic 
Sciences in Memory of Alfred Nobel was awarded to Robert Merton and Myron 
Scholes (Fisher Black had died some years earlier) “for a new method to determine 
the value of derivatives”. 


Growth of markets in the twentieth century. The methodology developed for the 
rational pricing and hedging of financial derivatives changed finance. The Wizards 
of Wall Street (i.e. the mathematical specialists conversant in the new methodology) 
have had a significant impact on the development of financial markets over the last 
few decades. Not only did the new option-pricing formula work, it transformed 
the market. When the Chicago Options Exchange first opened in 1973, less than 
a thousand options were traded on the first day. By 1995, over a million options were 
changing hands each day with current nominal values outstanding in the derivatives 
markets in the tens of trillions. So great was the role played by the Black—Scholes— 
Merton formula in the growth of the new options market that, when the American 
stock-market crashed in 1978, the influential business magazine Forbes put the 
blame squarely onto that one formula. Scholes himself has said that it was not so 
much the formula that was to blame, but rather that market traders had not become 
sufficiently sophisticated in using it. 

Along with academic innovation, technological developments (mainly on the 
information-technology (IT) side) also laid the foundations for an explosive growth 
in the volume of new risk-management and investment products. This development 
was further aided by worldwide deregulation in the 1980s. Important additional fac- 
tors contributing to an increased demand for risk-management skills and products 
were the oil crises of the 1970s and the 1970 abolition of the Bretton—Woods sys- 
tem of fixed exchange rates. Both energy prices and foreign exchange risk became 
highly volatile risk factors and customers required products to hedge them. The 
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1933 Glass-Steagall Act—passed in the US in the aftermath of the 1929 Depres- 
sion to prohibit commercial banks from underwriting insurance and most kinds of 
securities—indirectly paved the way for the emergence of investment banks, hungry 
for new business. Glass-Steagall was replaced in 1999 by the Financial Services Act, 
which repealed many of the former’s key provisions. Today many more companies 
are able to trade and use modern risk-management products. 


Disasters of the 1990s. In January 1992, the president of the New York Federal 
Reserve, E. Gerald Corrigan, speaking at the Annual Mid-Winter Meeting of the 
New York State Bankers Association, said: 


You had all better take a very, very hard look at off-balance-sheet activ- 
ities. The growth and complexity of [these] activities and the nature of 
the credit settlement risk they entail should give us cause for concern. 
... I hope this sounds like a warning, because it is. Off-balance-sheet 
activities [i.e. derivatives] have a role, but they must be managed and 
controlled carefully and they must be understood by top management 
as well as by traders and rocket scientists. 


Corrigan was referring to the growing volume of derivatives on banking books and 
the way they were accounted for. 

Many of us recall the headline “Barings forced to cease trading” in the Financial 
Times on 26 February 1995. A loss of £700 million ruined the oldest merchant 
banking group in the UK (established in 1761). Besides numerous operational errors 
(violating every qualitative guideline in the risk-management handbook), the final 
straw leading to the downfall of Barings was a so-called straddle position on the 
Nikkei held by the bank’s Singapore-based trader Nick Leeson. A straddle is a short 
position in a call and a put with the same strike—such a position allows for a gain 
if the underlying (in this case the Nikkei index) does not move too far up or down. 
There is, however, considerable loss potential if the index moves down (or up) by 
a large amount, and this is precisely what happened when the Kobe earthquake 
occurred. 

About three years later, on 17 September 1998, The Observer newspaper, referring 
to the downfall of Long-Term Capital Management (LTCM), summarized the mood 
of the times when it wrote: 


last week, free market economy died. Twenty five years of intellectual 
bullying by the University of Chicago has come to a close. 


The article continued: 


the derivatives markets are a rarefied world. They are peopled with 
individuals with an extraordinary grasp of mathematics—“a strange 
collection of Greeks, misfits and rocket scientists” as one observer put 
it last week. 


And referring to the Black-Scholes formula, the article asked: 


is this really the key to future wealth? Win big, lose bigger. 
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There were other important cases which led to a widespread discussion of the need 
for increased regulation: the Herstatt Bank case in 1974, Metallgesellschaft in 1993 
or Orange County in 1994. See Notes and Comments below for further reading on 
the above. 

The main reason for the general public’s mistrust of these modern tools of finance 
is their perceived triggering effect for crashes and bubbles. Derivatives have without 
doubt played a role in some spectacular cases and as a consequence are looked upon 
with a much more careful regulatory eye. However, they are by now so much part of 
Wall Street (or any financial institution) that serious risk management without these 
tools would be unthinkable. 

Thus it is imperative that mathematicians take a serious interest in derivatives 
and the risks they generate. Who has not yet considered a prepayment option on 
a mortgage or a change from a fixed-interest-rate agreement to a variable one, or 
vice versa (a so-called swap)? Moreover, many life insurance products now have 
options embedded. 


1.2.2 The Road to Regulation 


There is no doubt that regulation goes back a long way, at least to the time of the 
Venetian banks and the early insurance enterprises sprouting in London’s coffee 
shops in the eighteenth century. In those days one would rely to a large extent 
on self-regulation or local regulation, but rules were there. However, key develop- 
ments leading to the present regulatory risk-management framework are very much 
a twentieth century story. 

Much of the regulatory drive originated from the Basel Committee of Banking 
Supervision. This committee was established by the Central-Bank Governors of the 
Group of Ten (G-10) at the end of 1974. The Group of Ten is made up (oddly) of 
eleven industrial countries which consult and cooperate on economic, monetary and 
financial matters. The Basel Committee does not possess any formal supranational 
supervising authority, and hence its conclusions do not have legal force. Rather, it 
formulates broad supervisory standards and guidelines and recommends statements 
of best practice in the expectation that individual authorities will take steps to imple- 
ment them through detailed arrangements—statutory or otherwise—which are best 
suited to their own national system. The summary below is brief. Interested readers 
can consult, for example, Crouhy, Galai and Mark (2001) for further details, and 
should also see Notes and Comments below. 


The first Basel Accord. The first Basel Accord of 1988 on Banking Supervision 
(Basel I) took an important step towards an international minimum capital standard. 
Its main emphasis was on credit risk, by then clearly the most important source of 
risk in the banking industry. In hindsight, however, the first Basel Accord took an 
approach which was fairly coarse and measured risk in an insufficiently differenti- 
ated way. Also the treatment of derivatives was considered unsatisfactory. 


The birth of VaR. In 1993 the G-30 (an influential international body consisting of 
senior representatives of the private and public sectors and academia) published a 
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seminal report addressing for the first time so-called off-balance-sheet products, like 
derivatives, in a systematic way. Around the same time, the banking industry clearly 
saw the need for a proper risk management of these new products. At JPMorgan, 
for instance, the famous Weatherstone 4.15 report asked for a one-day, one-page 
summary of the bank’s market risk to be delivered to the chief executive officer 
(CEO) in the late afternoon (hence the “4.15”). Value-at-Risk (VaR) as a market risk 
measure was born and RiskMetrics set an industry-wide standard. 

In a highly dynamic world with round-the-clock market activity, the need for 
instant market valuation of trading positions (known as marking-to-market) became 
anecessity. Moreover, in markets where so many positions (both long and short) were 
written on the same underlyings, managing risks based on simple aggregation of 
nominal positions became unsatisfactory. Banks pushed to be allowed to consider 
netting effects, i.e. the compensation of long versus short positions on the same 
underlying. 

In 1996 the important Amendment to Basel I prescribed a so-called standardized 
model for market risk, but at the same time allowed the bigger (more sophisticated) 
banks to opt for an internal, VaR-based model (i.e. a model developed in house). 
Legal implementation was to be achieved by the year 2000. The coarseness problem 
for credit risk remained unresolved and banks continued to claim that they were not 
given enough incentives to diversify credit portfolios and that the regulatory capital 
rules currently in place were far too risk insensitive. Because of overcharging on 
the regulatory capital side of certain credit positions, banks started shifting business 
away from certain market segments that they perceived as offering a less attractive 
risk-return profile. 


The second Basel Accord. By 2001 a consultative process for a new Basel Accord 
(Basel II) had been initiated; this process is being concluded as this book goes to 
press. The main theme is credit risk, where the aim is that banks can use a finer, more 
risk-sensitive approach to assessing the risk of their credit portfolios. Banks opting 
for a more advanced, so-called internal-ratings-based approach are allowed to use 
internal and/or external credit-rating systems wherever appropriate. The second 
important theme of Basel II is the consideration of operational risk as a new risk 
class. 

Current discussions imply an implementation date of 2007, but there remains an 
ongoing debate on specific details. Industry is participating in several Quantitative 
Impact Studies in order to gauge the risk-capital consequences of the new accord. 
In Section 1.3.1 we will come back to some issues concerning this accord. 


Parallel developments in insurance regulation. It should be stressed that most of 
the above regulatory changes concern the banking world. We are also witnessing 
increasing regulatory pressure on the insurance side, coupled with a drive to com- 
bine the two regulatory frameworks, either institutionally or methodologically. As 
an example, the Joint Forum on Financial Conglomerates (Joint Forum) was estab- 
lished in early 1996 under the aegis of the Basel Committee on Banking Supervi- 
sion, the International Organization of Securities Commissions (IOSCO) and the 
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International Association of Insurance Supervisors (IAIS) to take forward the work 
of the so-called Tripartite Group, whose report was released in July 1995. The Joint 
Forum is comprised of an equal number of senior bank, insurance and securities 
supervisors representing each supervisory constituency. 

The process is underway in many countries. For instance, in the UK the Financial 
Services Authority (FSA) is stepping up its supervision across a wide range of finan- 
cial and insurance businesses. The same is happening in the US under the guidance 
of the Securities and Exchange Commission (SEC) and the Fed. In Switzerland, 
discussions are underway between the Bundesamt für Privatversicherungen (BPV) 
and the Eidgenössische Bankenkommission (EBK) concerning a joint supervisory 
office. In Section 1.3.2 we will discuss some of the current, insurance-related sol- 
vency issues. 


1.3 The New Regulatory Framework 


This section is intended to describe in more detail the framework that has emerged 
from the Basel II discussions and the parallel developments in the insurance world. 


1.3.1 Basel Il 


On 26 June 2004 the G-10 central-bank governors and heads of supervision endorsed 
the publication of the revised capital framework. The following statement is taken 
from this release. 


The Basel II Framework sets out the details for adopting more risk- 
sensitive minimum capital requirements [Pillar 1] for banking orga- 
nizations. The new framework reinforces these risk-sensitive require- 
ments by laying out principles for banks to assess the adequacy of their 
capital and for supervisors to review such assessments to ensure banks 
have adequate capital to support their risks [Pillar 2]. It also seeks to 
strengthen market discipline by enhancing transparency in banks’ finan- 
cial reporting [Pillar 3]. The text that has been released today reflects the 
results of extensive consultations with supervisors and bankers world- 
wide. It will serve as the basis for national rule-making and approval 
processes to continue and for banking organizations to complete their 
preparations for the new Framework’s implementation. 


The three-pillar concept. As is apparent from the above quote, a key conceptual 
change within the Basel II framework is the introduction of the three-pillar con- 
cept. Through this concept, the Basel Committee aims to achieve a more holistic 
approach to risk management that focuses on the interaction between the different 
risk categories; at the same time the three-pillar concept clearly signals the existing 
difference between quantifiable and non-quantifiable risks. 

Under Pillar 1 banks are required to calculate a minimum capital charge, referred 
to as regulatory capital, with the aim of bringing the quantification of this minimal 
capital more in line with the banks’ economic loss potential. Under the Basel II 
framework there will be a capital charge for credit risk, market risk and, for the first 
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time, operational risk. Whereas the treatment of market risk is unchanged relative 
to the 1996 Amendment of the Basel I Capital Accord, the capital charge for credit 
risk has been revised substantially. In computing the capital charge for credit risk 
and operational risk banks may choose between three approaches of increasing risk 
sensitivity and complexity; some details are discussed below. 

It is further recognized that any quantitative approach to risk management should 
be embedded in a well-functioning corporate governance structure. Thus best- 
practice risk management imposes clear constraints on the organization of the insti- 
tution, i.e. the board of directors, management, employees, internal and external 
audit processes. In particular, the board of directors assumes the ultimate responsi- 
bility for oversight of the risk landscape and the formulation of the company’s risk 
appetite. This is where Pillar 2 enters. Through this important pillar, also referred 
to as the supervisory review process, local regulators review the various checks and 
balances put into place. This pillar recognizes the necessity of an effective overview 
of the banks’ internal assessments of their overall risk and ensures that management 
is exercising sound judgement and has set aside adequate capital for the various 
risks. 

Finally, in order to fulfil its promise that increased regulation will also diminish 
systemic risk, clear reporting guidelines on risks carried by financial institutions 
are called for. Pillar 3 seeks to establish market discipline through a better public 
disclosure of risk measures and other information relevant to risk management. 
In particular, banks will have to offer greater insight into the adequacy of their 
capitalization. 


The capital charge for market risk. As discussed in Section 1.2.2, in the aftermath 
of the Basel I proposals in the early 1990s, there was a general interest in improv- 
ing the measurement of market risk, particularly where derivative products were 
concerned. This was addressed in detail in the 1996 Amendment to Basel I, which 
prescribed standardized market risk models but also allowed more sophisticated 
banks to opt for internal VaR models. In Chapter 2 we shall give a detailed discus- 
sion of the calculation of VaR. For the moment it suffices to know that, for instance, 
a 10-day VaR at 99% of $20 million means that our market portfolio will incur a loss 
of $20 million or more with probability 1% by the end of a 10-day holding period, 
if the composition remains fixed over this period. The choice of the holding period 
(10 days) and the confidence level (99%) lies in the hands of the regulators when 
VaR is used for the calculation of regulatory capital. As a consequence of these 
regulations, we have witnessed a quantum leap in the prominence of quantitative 
risk modelling throughout all echelons of financial institutions. 


Credit risk from Basel I to II. na banking context, by far the oldest risk type to 
be regulated is credit risk. As mentioned in Section 1.2.2, Basel I handled this type 
of risk in a rather coarse way. Under Basel I and II the credit risk of a portfolio is 
assessed as the sum of risk-weighted assets, that is the sum of notional exposures 
weighted by a coefficient reflecting the creditworthiness of the counterparty (the risk 
weight). In Basel I, creditworthiness is split into three crude categories: governments, 
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regulated banks and others. For instance, under Basel I, the risk-capital charge for 
a loan to a corporate borrower is five times higher than for a loan to an OECD bank. 
Also, the risk weight for all corporate borrowers is identical, independent of their 
credit-rating category. 

Due to its coarseness, the implementation of Basel I is extremely simple. But 
with the establishment of more detailed credit risk databases, the improvement of 
analytic models, and the rapid growth in the market for credit derivatives, banks have 
pressed regulators to come up with more risk-specific capital-adequacy guidelines. 
This is the main content of the new Basel II proposals, where banks will be allowed 
to choose between standardized approaches or more advanced internal-ratings- 
based (IRB) approaches for handling credit risk. The final choice will, however, 
also depend on the size and complexity of the bank, with the larger, international 
banks having to go for the more advanced models. 

Already the banks opting for the standardized approach can differentiate better 
among the various credit risks in their portfolio, since under the Basel II framework 
the risk sensitivity of the available risk weights has been increased substantially. 
Under the more advanced IRB approach, a bank’s internal assessment of the riski- 
ness of acredit exposure is used as an input to the risk-capital calculation. The overall 
capital charge is then computed by aggregating the internal inputs using formulas 
specified by the Basel Committee. While this allows for increased risk sensitivity 
in the IRB capital charge compared with the standardized approach, portfolio and 
diversification effects are not taken into account; this would require the use of fully 
internal models as in the market risk case. This issue is currently being debated 
in the risk community, and it is widely expected that in the longer term a revised 
version of the Basel II Capital Accord allowing for the use of fully internal models 
will come into effect. In Chapter 8, certain aspects of the regulatory treatment of 
credit risk will be discussed in more detail. 


Opening the door to operational risk. A basic premise for Basel II was that, 
whereas the new regulatory framework would enable banks to reduce their credit 
risk capital charge through internal credit risk models, the overall size of regulatory 
capital throughout the industry should stay unchanged under the new rules. This 
opened the door for the new risk category of operational risk, which we discuss in 
more depth in Section 10.1. Recall that Basel II defines operational risk as the risk 
of losses resulting from inadequate or failed internal processes, people and systems 
or from external events. The introduction of this new risk class has led to heated 
discussions among the various stakeholders. Whereas everyone agrees that risks 
like human risk (e.g. incompetence, fraud), process risk (e.g. model, transaction 
and operational control risk) and technology risk (e.g. system failure, programming 
error) are important, much disagreement exists on how far one should (or can) go 
towards quantifying such risks. This becomes particularly difficult when the finan- 
cially more important risks like fraud and litigation are taken into account. Nobody 
doubts the importance of operational risk for the financial and insurance sector, but 
much less agreement exists on how to measure this risk. 
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The Cooke ratio. A crude measure of capitalization is the well-known Cooke ratio, 
which specifies that capital should be at least 8% of the risk-weighted assets of a 
company. The precise definition of risk capital is rather complex, involving various 
tiers of differing liquidity and legal character, and is very much related to existing 
accounting standards. For more detail see, for example, Crouhy, Galai and Mark 
(2001). 


Some criticism. The benefits arising from the regulation of financial services 
are not generally in doubt. Customer-protection acts, basic corporate governance, 
clear guidelines on fair and comparable accounting rules, the ongoing pressure for 
transparent customer and shareholder information on solvency, capital- and risk- 
management issues are all positive developments. Despite these positive points, the 
specific proposals of Basel II have also elicited criticism; issues that have been raised 
include the following. 


e The cost factor of setting up a well-functioning risk-management system 
compliant with the present regulatory framework is significant, especially (in 
relative terms) for smaller institutions. 


e So-called risk-management herding can take place, whereby institutions fol- 
lowing similar (perhaps VaR-based) rules may all be running for the same 
exit in times of crises, consequently destabilizing an already precarious situa- 
tion even further. This herding phenomenon has been suggested in connection 
with the 1987 crash and the events surrounding the 1998 LTCM crisis. On a 
related note, the procyclical effects of financial regulation, whereby capital 
requirements may rise in times of recession and fall in times of expansion, 
may contribute negatively to the availability of liquidity in moments where 
the latter is most needed. 


e Regulation could lead to overconfidence in the quality of statistical risk mea- 
sures and tools. 


Several critical discussions have taken place questioning to what extent the 
crocodile of regulatory risk management is eating its own tail. In an article of 12 June 
1999, the Economist wrote that “attempts to measure and put a price on risk in finan- 
cial markets may actually be making them riskier”; on the first page of the article, 
entitled “The price of uncertainty”, the proverbial crocodile appeared. The reader 
should be aware that there are several aspects to the overall regulatory side of risk 
management which warrant further discussion. As so often, “the truth” of what con- 
stitutes good and proper supervision will no doubt be somewhere between the more 
extreme views. The Basel process has the very laudable aspect that constructive 
criticism is taken seriously. 


1.3.2 Solvency 2 


In this section we take a brief look at regulatory developments regarding risk man- 
agement in the insurance sector. We concentrate on the current solvency discussion, 
also referred to as Solvency 2. The following statement, made by the EU Insurance 
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Solvency Sub-Committee (2001), focuses on the differences between the Basel II 
and Solvency 2 frameworks. 


The difference between the two prudential regimes goes further in 
that their actual objectives differ. The prudential objective of the Basel 
Accord is to reinforce the soundness and stability of the international 
banking system. To that end, the initial Basel Accord and the draft 
New Accord are directed primarily at banks that are internationally 
active. The draft New Accord attaches particular importance to the 
self-regulating mechanisms of a market where practitioners are depen- 
dent on one another. In the insurance sector, the purpose of pruden- 
tial supervision is to protect policyholders against the risk of (iso- 
lated) bankruptcy facing every insurance company. The systematic risk, 
assuming that it exists in the insurance sector, has not been deemed to be 
of sufficient concern to warrant minimum harmonisation of prudential 
supervisory regimes at international level; nor has it been the driving 
force behind European harmonisation in this field. 


More so than in the case of banking regulation, the regulatory framework for insur- 
ance companies has a strong local flavour where many local statutory rules prevail. 
The various solvency committees in EU member countries and beyond are trying to 
come up with some global principles which would be binding on a larger geograph- 
ical scale. We discuss some of the more recent developments below. 


From Solvency I to 2. The first EU non-life and life directives on solvency mar- 
gins appeared around 1970. The latter was defined as an extra capital buffer against 
unforeseen events such as higher than expected claims levels or unfavourable invest- 
ment results. In 1997, the Miller report appeared under the heading “Solvency of 
insurance undertakings’—this led to a review of the solvency rules and initiated 
the Solvency | project, which was completed in 2002 and came into force in 2004. 
Meanwhile, Solvency 2 was initiated in 2001 with the publication of the influen- 
tial Sharma report—the detailed technical rules of Solvency 2 are currently being 
worked out. 

Solvency 1 was a rather coarse framework calling for a minimum guarantee 
fund (minimal capital required) of €3 million, and a solvency margin consisting of 
16-18% of non-life premiums together with 4% of the technical provisions for life. 
This led to a single, robust system which is easy to understand and inexpensive to 
monitor. However, on the negative side, it is mainly volume based and not explic- 
itly risk based; issues like guarantees, embedded options and proper matching of 
assets and liabilities were largely neglected in many countries. These and further 
shortcomings will be addressed in Solvency 2. 

At the heart of Solvency 2 lies a risk-oriented assessment of overall solvency, 
honouring the three-pillar concept from Basel II (see the previous section). Insurers 
are encouraged to measure and manage their risks based on internal models. Con- 
sistency between Solvency 2 (Insurance) and Basel II (Banking) is adhered to as 
much as possible. The new framework should allow for an efficient supervision of 
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insurance groups (holdings) and financial conglomerates (bank-assurance). From 
the start, an increased harmonization of supervisory methodology between the dif- 
ferent legislative entities was envisaged, based on a wide international cooperation 
with actuarial, financial and accounting bodies. 

Without entering into the specifics of the framework, the following points related 
to Pillar 1 should be mentioned. In principle, all risks are to be analysed including 
underwriting, credit, market, operational (corresponding to internal operational risk 
under Basel II), liquidity and event risk (corresponding to external operational risk 
under Basel II). Strong emphasis is put on the modelling of interdependencies and 
a detailed analysis of stress tests. The system should be as much as possible principle 
based rather than rules based and should lead to prudent regulation which focuses on 
the total balance sheet, handling assets and liabilities in a single common framework. 

The final decision on solvency is based on a two-tier procedure. This involves 
setting a first safety barrier at the level of the so-called target capital based on risk- 
sensitive, market-consistent valuation; breaches of this early-warning level would 
trigger regulatory intervention. The second and final tier is the minimal capital level 
calculated with the old Solvency 1 rules. It is interesting to note that in the defini- 
tion of target capital, the expected shortfall for a holding period is used as a risk 
measure rather than Value-at-Risk, reflecting actuaries’ experience with skewed and 
heavy-tailed pay-off functions; this alternative risk measure will be defined in Sec- 
tion 2.2.4. The reader interested in finding out more about the ongoing developments 
in insurance regulation will find relevant references in Notes and Comments. 


1.4 Why Manage Financial Risk? 


An important issue that we have barely dealt with concerns the reasons why we 
should invest in QRM in the first place. This question can be posed from various 
perspectives, including those of the customer of a financial institution, its sharehold- 
ers, management, board of directors, regulators, politicians, or the public at large. 
Each of these stakeholders may have a different answer, and, at the end of the day, an 
equilibrium between the various interests will have to be found. In this section, we 
will focus on some of the players involved and give a selective account of some of 
the issues. It is not our aim, nor do we have the competence, to give a full treatment 
of this important subject. 


1.4.1 A Societal View 


Modern society relies on the smooth functioning of banking and insurance systems 
and has a collective interest in the stability of such systems. The regulatory process 
culminating in Basel II has been strongly motivated by the fear of systemic risk, 
i.e. the danger that problems in a single financial institution may spill over and, in 
extreme situations, disrupt the normal functioning of the entire financial system. 
Consider the following remarks made by Alan Greenspan before the Council on 
Foreign Relations in Washington, DC, on 19 November 2002 (Greenspan 2002). 


Today, I would like to share with you some of the evolving international 
financial issues that have so engaged us at the Federal Reserve over the 
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past year. I, particularly, have been focusing on innovations in the man- 
agement of risk and some of the implications of those innovations for 
our global economic and financial system. ... The development of our 
paradigms for containing risk has emphasized dispersion of risk to those 
willing, and presumably able, to bear it. If risk is properly dispersed, 
shocks to the overall economic systems will be better absorbed and less 
likely to create cascading failures that could threaten financial stability. 


In the face of such spillover scenarios, society views risk management positively 
and entrusts regulators with the task of forging the framework that will safeguard 
its interests. Consider the debate surrounding the use and misuse of derivatives. 
Regulation serves to reduce the risk of the misuse of these products, but at the same 
time recognizes their societal value in the global financial system. Perhaps contrary 
to the popular view, derivatives should be seen as instruments that serve to enhance 
stability of the system rather than undermine it, as argued by Greenspan in the same 
address. 


Financial derivatives, more generally, have grown at a phenomenal pace 
over the past fifteen years. Conceptual advances in pricing options and 
other complex financial products, along with improvements in computer 
and telecommunications technologies, have significantly lowered the 
costs of, and expanded the opportunities for, hedging risks that were 
not readily deflected in earlier decades. Moreover, the counterparty 
credit risk associated with the use of derivative instruments has been 
mitigated by legally enforceable netting and through the growing use 
of collateral agreements. These increasingly complex financial instru- 
ments have especially contributed, particularly over the past couple of 
stressful years, to the development of a far more flexible, efficient, and 
resilient financial system than existed just a quarter-century ago. 


1.4.2 The Shareholder’s View 


It is widely believed that proper financial risk management can increase the value 
of a corporation and hence shareholder value. In fact, this is the main reason why 
corporations which are not subject to regulation by financial supervisory authori- 
ties engage in risk-management activities. Understanding the relationship between 
shareholder value and financial risk management also has important implications 
for the design of risk-management (RM) systems. Questions to be answered include 
the following. 


e When does RM increase the value of a firm, and which risks should be man- 
aged? 

e Howshould RM concerns factor into investment policy and capital budgeting? 

There is a rather extensive corporate finance literature on the issue of “corporate risk 


management and shareholder value”. We briefly discuss some of the main arguments. 
In this way we hope to alert the reader to the fact that there is more to RM than 
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the mainly technical questions related to the implementation of RM strategies dealt 
with in the core of this book. 

The first thing to note is that from a corporate-finance perspective it is by no means 
obvious that in a world with perfect capital markets RM enhances shareholder value: 
while individual investors are typically risk averse and should therefore manage the 
risk in their portfolios, it is not clear that RM or risk reduction at the corporate 
level, such as hedging a foreign-currency exposure or holding a certain amount 
of risk capital, increases the value of a corporation. The rationale for this—at first 
surprising—observation is simple: if investors have access to perfect capital markets, 
they can do the RM transactions they deem necessary via their own trading and 
diversification. The following statement from the chief investment officer of an 
insurance company exemplifies this line of reasoning: “If our shareholders believe 
that our investment portfolio is too risky, they should short futures on major stock 
market indices”. 

The potential irrelevance of corporate RM for the value of a corporation is an 
immediate consequence of the famous Modigliani-Miller Theorem (Modigliani and 
Miller 1958). This result, which marks the beginning of modern corporate finance 
theory, states that, in an ideal world without taxes, bankruptcy costs and informa- 
tional asymmetries, and with frictionless and arbitrage-free capital markets, the 
financial structure of a firm—and hence also its RM decisions—are irrelevant for 
the firm’s value. Hence, in order to find reasons for corporate RM, one has to “turn 
the Modigliani—Miller Theorem upside down” and identify situations where RM 
enhances the value of a firm by deviating from the unrealistically strong assump- 
tions of the theorem. This leads to the following rationales for RM. 


e RM can reduce tax costs. Under a typical tax regime the amount of tax to 
be paid by a corporation is a convex function of its profits; by reducing the 
variability in a firm’s cash flow, RM can therefore lead to a higher expected 
after-tax profit. 


e RM can be beneficial, since a company may (and usually will) have better 
access to capital markets than individual investors. 


e RM can increase the firm value in the presence of bankruptcy costs, as it 
makes bankruptcy less likely. 


e RM can reduce the impact of costly external financing on the firm value, as it 
facilitates the achievement of optimal investment. 


The last two points merit a more detailed discussion. Bankruptcy costs consist of 
direct bankruptcy costs, such as the cost of lawsuits, and the more important indirect 
bankruptcy costs. The latter may include liquidation costs, which can be substantial 
in the case of intangibles like research and development (R&D) and know-how. 
This is why high R&D spending appears to be positively related to the use of 
RM techniques. Moreover, increased likelihood of bankruptcy often has a negative 
effect on key employees, management and customer relations, in particular in areas 
where a client wants a long-term business relationship. For instance, few customers 
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would want to enter into a life insurance contract with an insurance company which 
is known to be close to bankruptcy. On a related note, banks which are close to 
bankruptcy might be faced with the unpalatable prospect of a bank run, where 
depositors try to withdraw their money simultaneously. A further discussion of 
these issues is given in Altman (1993). 

It is a “stylized fact of corporate finance” that for a corporation external funds are 
more costly to obtain than internal funds, an observation which is usually attributed 
to problems of asymmetric information between the management of a corporation 
and bond and equity investors. For instance, raising external capital from outsiders 
by issuing new shares might be costly if the new investors, who have incomplete 
information about the economic prospects of a firm, interpret the share issue as 
a sign that the firm is overvalued. This can generate a rationale for RM for the 
following reason: without RM the increased variability of a company’s cash flow 
will be translated either into an increased variability of the funds which need to be 
raised externally or to an increased variability in the amount of investment. With 
increasing marginal costs of raising external capital and decreasing marginal profits 
from new investment, this leads to a decrease in (expected) profits. Hence proper 
RM, which amounts to a smoothing of the cash flow generated by a corporation, 
can be beneficial. For references to the literature see Notes and Comments. 


1.4.3 Economic Capital 


As we have just seen, a corporation typically has strong incentives to strictly limit 
the probability of bankruptcy in order to avoid the associated bankruptcy costs. 
This is directly linked to the notion of economic capital. In a narrow sense, eco- 
nomic capital is the capital that shareholders should invest in the company in 
order to limit the probability of default to a given confidence level over a given 
time horizon. More broadly, economic capital offers a firm-wide language for dis- 
cussing and pricing risk that is related directly to the principal concerns of man- 
agement and other key stakeholders, namely institutional solvency and profitability 
(see Matten 2000). In this broader sense, economic capital represents the emerg- 
ing best practice for measuring and reporting all kinds of risk across a financial 
organization. 

Economic capital is so called because it measures risk in terms of economic reali- 
ties rather than potentially misleading regulatory or accounting rules; moreover, part 
of the measurement process involves converting a risk distribution into the amount 
of capital that is required to support the risk, in line with the institution’s target 
financial strength (e.g. credit rating). Hence the calculation of economic capital is 
a process that begins with the quantification of the risks that any given company 
faces over a given time period. These risks include those that are well defined from 
a regulatory point of view, such as credit, market and operational risks, and also 
includes other categories like insurance, liquidity, reputational and strategic or busi- 
ness risk. When modelled in detail and aggregated one obtains a value distribution 
in line with the Merton model for firm valuation as discussed in Chapter 8. 


1.5. Quantitative Risk Management 19 


Given such a value distribution, the next step involves the determination of the 
probability of default (solvency standard) that is acceptable to the institution. The 
mapping from risk (solvency standard) to capital often uses standard external bench- 
marks for credit risk. For instance, a firm that capitalizes to Moody’s Aa standard 
over a one-year horizon determines its economic capital as the “cushion” required 
to keep the firm solvent over a one-year period with 99.97% probability; firms rated 
Aa by Moody’s have historically defaulted with a 0.03% frequency over a one-year 
horizon (see, for example, Duffie and Singleton 2003, Table 4.2). The choice of hori- 
zon must relate to natural capital planning or business cycles, which might mean 
one year for a bank but typically longer for an insurance company. In the ideal RM 
set-up, it is economic capital that is used for setting risk limits. Or, as stated in 
(Drzik, Nakada and Schuermann 1998), economic capital can serve as a common 
currency for risk limits. That paper also discusses the way in which economic capital 
(capital you need) can be compared with physical capital (capital you have) and how 
corporate-finance decisions can be based on this comparison. 

We hope that our brief discussion of the economic issues surrounding modern 
RM has convinced the reader that there is more to RM than the mere statistical 
computation of risk measures, important though the latter may be. The Notes and 
Comments provide some references for readers who want to learn more about the 
economic foundations of RM. 


1.5 Quantitative Risk Management 


In this first chapter we have tried to place QRM in a larger historical, institutional, 
and even societal framework, since a study of QRM without a discussion of its 
proper setting and motivation makes little sense. In the remainder of the book we 
adopt a somewhat narrower view and treat QRM as a quantitative science using the 
language of mathematics in general, and probability and statistics in particular. 

In this section we describe the challenge that we have attempted to meet in this 
book and discuss where QRM may lead in the future. 


1.5.1 The Nature of the Challenge 


We set ourselves the task of defining a new discipline of QRM and our approach 
to this task has two main strands. On the one hand, we have attempted to put 
current practice onto a firmer mathematical footing where, for example, concepts 
like profit-and-loss distributions, risk factors, risk measures, capital allocation and 
risk aggregation are given formal definitions and a consistent notation. In doing this 
we have been guided by the consideration of what topics should form the core of a 
course on QRM for a wide audience of students interested in RM issues; nonetheless, 
the list is far from complete and will continue to evolve as the discipline matures. On 
the other hand, the second strand of our endeavour has been to put together material 
on techniques and tools which go beyond current practice and address some of the 
deficiencies that have been raised repeatedly by critics. In the following paragraphs 
we elaborate on some of these issues. 
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Extremes matter. A very important challenge in QRM, and one that makes it par- 
ticularly interesting as a field for probability and statistics, is the need to address 
unexpected, abnormal or extreme outcomes, rather than the expected, normal or 
average outcomes that are the focus of many classical applications. This is in tune 
with the regulatory view expressed by Alan Greenspan: 


From the point of view of the risk manager, inappropriate use of the 
normal distribution can lead to an understatement of risk, which must be 
balanced against the significant advantage of simplification. From the 
central bank’s corner, the consequences are even more serious because 
we often need to concentrate on the left tail of the distribution in for- 
mulating lender-of-last-resort policies. Improving the characterization 
of the distribution of extreme values is of paramount importance. 


Joint Central Bank Research Conference, 1995 


The need for a response to this challenge became very clear in the wake of the LTCM 
case in 1998. John Meriwether, the founder of the hedge fund, clearly learned from 
this experience of extreme financial turbulence; he is quoted as saying: 


With globalisation increasing, you’ll see more crises. Our whole focus 
is on the extremes now—what’s the worst that can happen to you in 
any situation—because we never want to go through that again. 


The Wall Street Journal, 21 August 2000 


Much space is devoted in our book to models for financial risk factors that go beyond 
the normal (or Gaussian) model and attempt to capture the related phenomena of 
heavy tails, volatility and extreme values. 


The interdependence and concentration of risks. A further important challenge 
is presented by the multivariate nature of risk. Whether we look at market risk or 
credit risk, or overall enterprise-wide risk, we are generally interested in some form 
of aggregate risk that depends on high-dimensional vectors of underlying risk factors 
such as individual asset values in market risk, or credit spreads and counterparty 
default indicators in credit risk. 

A particular concern in our multivariate modelling is the phenomenon of depend- 
ence between extreme outcomes, when many risk factors move against us simulta- 
neously. Again in connection with the LTCM case we find the following quote in 
Business Week, September 1998. 


Extreme, synchronized rises and falls in financial markets occur infre- 
quently but they do occur. The problem with the models is that they did 
not assign a high enough chance of occurrence to the scenario in which 
many things go wrong at the same time—the “perfect storm” scenario. 


In a perfect storm scenario the risk manager discovers that the diversification he 
thought he had is illusory; practitioners describe this also as a concentration of risk. 
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Myron Scholes, a prominent figure in the development of RM, alludes to this in 
Scholes (2000), where he argues against the regulatory overemphasis of VaR in the 
face of the more important issue of co-movements in times of market stress: 


Over the last number of years, regulators have encouraged financial 
entities to use portfolio theory to produce dynamic measures of risk. 
VaR, the product of portfolio theory, is used for short-run, day-to-day 
profit-and-loss exposures. Now is the time to encourage the BIS and 
other regulatory bodies to support studies on stress test and concen- 
tration methodologies. Planning for crises is more important than VaR 
analysis. And such new methodologies are the correct response to recent 
crises in the financial industry. 


The problem of scale. A further challenge in QRM is the typical scale of the 
portfolios under consideration; in the most general case a portfolio may represent 
the entire position in risky assets of a financial institution. Calibration of detailed 
multivariate models for all risk factors is a well-nigh impossible task and hence any 
sensible strategy involves dimension reduction, that is to say the identification of 
key risk drivers and a concentration on modelling the main features of the overall 
risk landscape. 

In short we are forced to adopt a fairly “broad-brush” approach. Where we use 
econometric tools, such as models for financial return series, we are content with rel- 
atively simple descriptions of individual series which capture the main phenomenon 
of volatility, and which can be used in a parsimonious multivariate factor model. 
Similarly, in the context of portfolio credit risk, we are more concerned with finding 
suitable models for the default dependence of counterparties than with accurately 
describing the mechanism for the default of an individual, since it is our belief that 
the former is at least as important as the latter in determining the risk of a large 
diversified portfolio. 


Interdisciplinarity. Another aspect of the challenge of QRM is the fact that ideas 
and techniques from several existing quantitative disciplines are drawn together. 
When one considers the ideal education for a quantitative risk manager of the future, 
then no doubt a combined quantitative skillset should include concepts, techniques 
and tools from such fields as mathematical finance, statistics, financial econometrics, 
financial economics and actuarial mathematics. Our choice of topics is strongly 
guided by a firm belief that the inclusion of modern statistical and econometric 
techniques and a well-chosen subset of actuarial methodology are essential for the 
establishment of best-practice QRM. Certainly QRM is not just about financial 
mathematics and derivative pricing, important though these may be. 

Of course, the quantitative risk manager operates in an environment where addi- 
tional non-quantitative skills are equally important. Communication is certainly the 
most important skill of all, as a risk professional by definition of his/her duties will 
have to interact with colleagues with diverse training and background at all levels 
of the organization. Moreover, a quantitative risk manager has to familiarize him or 
herself quickly with all-important market practice and institutional details. Finally, a 
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certain degree of humility will also be required to recognize the role of quantitative 
risk management in a much larger picture. 


1.5.2 QRM for the Future 


It cannot be denied that the use of QRM in the insurance and banking industry has 
had an overall positive impact on the development of those industries. However, RM 
technology is not restricted to the financial-services industry and similar develop- 
ments are taking place in other sectors of industry. Some of the earliest applications 
of QRM are to be found in the manufacturing industry, where similar concepts and 
tools exist under names like reliability or total quality control. Industrial companies 
have long recognized the risks associated with bringing faulty products to the mar- 
ket. The car manufacturing industry in Japan in particular has been an early driving 
force in this respect. 

More recently, QRM techniques have been adopted in the transport and energy 
industries, to name but two. In the case of energy there are obvious similarities 
with financial markets: electrical power is traded on energy exchanges; derivatives 
contracts are used to hedge future price uncertainty; companies optimize investment 
portfolios combining energy products with financial products; a current debate in 
the industry concerns the extent to which existing Basel II methodology can be 
transferred to the energy sector. However, there are also important dissimilarities 
due to the specific nature of the industry; most importantly there is the issue of 
the cost of storage and transport of electricity as an underlying commodity and the 
necessity of modelling physical networks including the constraints imposed by the 
existence of national boundaries and quasi-monopolies. 

A further exciting area concerns the establishment of markets for environmental 
emission allowances. For example, the Chicago Climate Futures Exchange (CCFE) 
currently offers futures contracts on sulphur dioxide emissions. These are traded 
by industrial companies producing the pollutant in their manufacturing process and 
force such companies to consider the cost of pollution as a further risk in their risk 
landscape. 

A natural consequence of the evolution of QRM thinking in different industries is 
an interest in the transfer of risks between industries; this process is known as ART 
(alternative risk transfer). To date the best examples are of risk transfer between 
the insurance and banking industries, as illustrated by the establishment in 1992 of 
catastrophe futures by the Chicago Board of Trade. These came about in the wake 
of Hurricane Andrew, which caused $20 billion of insured losses on the East Coast 
of the US. While this was a considerable event for the insurance industry in relation 
to overall reinsurance capacity, it represented only a drop in the ocean compared 
with the daily volumes traded worldwide on financial exchanges. This led to the 
recognition that losses could be covered in future by the issuance of appropriately 
structured bonds with coupon streams and principal repayments dependent on the 
occurrence or non-occurrence of well-defined natural catastrophe events, such as 
storms and earthquakes. 
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A speculative view of where these developments may lead is given by Shiller 
(2003), who argues that the proliferation of RM thinking coupled with the tech- 
nological sophistication of the twenty-first century will allow any agent in society, 
from a company to a country to an individual, to apply QRM methodology to the 
risks they face. In the case of an individual this may be the risk of unemployment, 
depreciation in the housing market or the investment in the education of children. 


Notes and Comments 


The language of probability and statistics plays a fundamental role throughout the 
book and readers are expected to have a good knowledge of these subjects. At the 
elementary level, Rice (1995) gives a good first introduction to both of these. More 
advanced texts in probability and stochastic processes are Williams (1991), Resnick 
(1992) and Rogers and Williams (1994); the full depth of these texts is certainly not 
required for the understanding of this book, though they provide excellent reading 
material for the mathematically more sophisticated reader who also has an interest 
in mathematical finance. Further recommended texts on statistical inference include 
Casella and Berger (2002), Bickel and Doksum (2001), Davison (2003) and Lindsey 
(1996). 

An excellent text on the history of risk and probability with financial applications 
in mind is Bernstein (1998). Additional useful material on the history of the subject 
is to be found in Field (2003). 

For the mathematical reader looking to acquire more knowledge of relevant 
economics we recommend Mas-Colell, Whinston and Green (1995) for microe- 
conomics; Campbell, Lo and MacKinlay (1997) or Gourieroux and Jasak (2001) 
for econometrics; and Brealey and Myers (2000) for corporate finance. From the 
vast literature on options, an entry-level text for the general reader is Hull (1997). 
At a more mathematical level we like Bingham and Kiesel (1998) and Musiela and 
Rutkowski (1997). One of the most readable texts on the basic notion of options is 
Cox and Rubinstein (1985). For a rather extensive list of the kind of animals to be 
found in the zoological garden of derivatives, see, for example, Haug (1998). 

There are several texts on the spectacular losses due to speculative trading and 
careless use of derivatives. The LTCM case is well documented in Dunbar (2000), 
Lowenstein (2000) and Jorion (2000), the latter particularly for the technical risk- 
measurement issues involved. Boyle and Boyle (2001) give a very readable account 
of the Orange County, Barings and LTCM stories. A useful website on RM, con- 
taining a growing collection of industry case studies, is www.erisk.com. 

An overview of options embedded in life insurance products is given in Dillmann 
(2002), guarantees are discussed in detail in Hardy (2003), and Briys and de Varenne 
(2001) contains an excellent account of RM issues facing the (life) insurance indus- 
try. 

The historical development of banking regulation is well described in Crouhy, 
Galai and Mark (2001) and Steinherr (1998). For details of the current rules and 
regulations coming from the Basel Committee, see its website at www.bis.org/bebs. 
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Besides copies of the various accords, one also finds useful working papers, publi- 
cations and comments written by stakeholders on the various consultative packages. 
For Solvency 2, many documents are being prepared, and the Web is the best place 
to start looking; a forthcoming text is Sandstrom (2005). The complexity of RM 
methodology in the wake of Basel II is critically addressed by Hawke (2003), in his 
capacity as US Comptroller of the Currency. 

For a very detailed overview of relevant practical issues underlying RM we again 
strongly recommend Crouhy, Galai and Mark (2001). A text stressing the use of VaR 
as a risk measure and containing several worked examples is Jorion (2001), who also 
has a useful teaching manual on the same subject (Jorion 2002a). Insurance-related 
issues in RM are well presented in Doherty (2000). 

For a comprehensive discussion of the management of bank capital given regula- 
tory constraints see Matten (2000). Graham and Rogers (2002) contains a discussion 
of RM and tax incentives. A formal account of the Modigliani—Miller Theorem and 
its implication can be found in many textbooks on corporate finance: a standard 
reference is Brealey and Myers (2000); de Matos (2001) gives a more theoretical 
account from the perspective of modern financial economics. Both texts also discuss 
the implications of informational asymmetries between the various stakeholders in 
a corporation. Formal models looking at RM from a corporate finance angle are to 
be found in Froot and Stein (1998), Froot, Scharfstein and Stein (1993) and Stulz 
(1996, 2002). For a specific discussion on corporate finance issues in insurance see 
Froot (2005) and Hancock, Huber and Koch (2001). 

There are several studies on the use of RM techniques for non-financial firms 
(see, for example, Bodnar, Hyat and Marston 1999; Geman 2005). Two references 
in the area of reliability of industrial processes are Bedford and Cooke (2001) and 
Does, Roes and Trip (1999). An interesting edited volume on alternative risk trans- 
fer (ARTs) is Shimpi (1999); a detailed study of model risk in the ART context is 
Schmock (1999). An area we have not mentioned so far in our discussion of QRM in 
the future is that of real options. A real option is the right, but not the obligation, to 
take an action (e.g. deferring, expanding, contracting or abandoning) at a predeter- 
mined cost called the exercise price for a predetermined period of time—the life of 
the option. This definition is taken from Copeland and Antikarov (2001). Examples 
of real options discussed in the latter are the valuation of an internet project and of 
a pharmaceutical research and development project. A further useful reference is 
Brennan and Trigeorgis (2000). 


2 


Basic Concepts in Risk Management 


In this chapter we discuss essential concepts in quantitative risk management. We 
begin by introducing a probabilistic framework for modelling financial risk and we 
give formal definitions for notions such as risk, profit and loss, risk factors and 
mapping. Moreover, we discuss a number of examples from the areas of market and 
credit risk, illustrating how typical risk-management problems fit into the general 
framework. 

A central issue in modern risk management is the measurement of risk. As 
explained in Chapter 1, the need to quantify risk arises in many different con- 
texts. For instance, a regulator measures the risk exposure of a financial institution 
in order to determine the amount of capital that institution has to hold as a buffer 
against unexpected losses. Similarly, the clearing house of an exchange needs to set 
margin requirements for investors trading on that exchange. In Section 2.2 we give 
an overview of the existing approaches to measuring risk and discuss their strengths 
and weaknesses. Particular attention will be given to Value-at-Risk and the related 
notion of expected shortfall. 

In Section 2.3 we present some standard methods used in the financial industry 
for measuring market risk over a short horizon, such as the variance—covariance 
method, the historical-simulation method and methods based on Monte Carlo simu- 
lation. We consider the use of scaling rules for transforming one-period risk-measure 
estimates into estimates for longer time horizons and give a short discussion of back- 
testing approaches for monitoring the performance of risk-measurement systems. 
We conclude with an example of the application of standard methodology. 


2.1 Risk Factors and Loss Distributions 
2.1.1 General Definitions 


We represent the uncertainty about future states of the world by a probability space 
(2, F, P), which is the domain of all random variables (rvs) we introduce below. 
Consider a given portfolio such as a collection of stocks or bonds, a book of deriva- 
tives, a collection of risky loans or even a financial institution’s overall position in 
risky assets. We denote the value of this portfolio at time s by V(s) and assume that 
the rv V (s) is observable at time s. For a given time horizon A, such as 1 or 10 days, 
the loss of the portfolio over the period [s, s + A] is given by 


Lis,s¢A] = —(V(s + A) — V(s)). 
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While Lis s+4] is assumed to be observable at time s + A, it is typically random from 
the viewpoint of time s. The distribution of Ljs,54,] is termed the loss distribution. 

We distinguish between the conditional loss distribution, i.e. the distribution of 
Lts,s+A] given all available information up to and including time s, and the uncon- 
ditional loss distribution; this issue is taken up in more detail below. 


Remark 2.1. Practitioners in risk management are often concerned with the so- 
called profit-and-loss (P&L) distribution. This is the distribution of the change in 
value V(s + A) — V(s), i.e. of the rv —Lys,s4a]. However, in risk management 
we are mainly concerned with the probability of large losses and hence with the 
upper tail of the loss distribution. Hence we often drop the P from P&L, both in 
notation and language. It is a standard convention in statistics to present results on 
tail estimation for the upper tail of distributions. Moreover, actuarial risk theory is a 
theory of positive rvs. Hence our focus on loss distributions facilitates the application 
of techniques from these fields. 


In most parts of the book we consider a fixed horizon A. In that case it will be con- 
venient to measure time in units of A and to introduce a time series notation, where 
we move from a generic process Y (s) to the time series (Y; )ren with Y, := Y (tA). 
Using this notation the loss is written as 


Lii = Lpa, e+pa] = ~ (Vi — Ve). (2.1) 


For instance, in market risk management we often work with financial models where 
the calendar time s is measured in years and interest rates and volatilities are quoted 
on an annualized basis. If we are interested in daily losses we set A = 1/365 or 
A 7% 1/250; the latter convention is mainly used in markets for equity derivatives 
since there are approximately 250 trading days per year. The rvs V, and V;+, then 
represent the portfolio value on days t and t + 1, respectively, and L;+1 is the loss 
from day ż to dayt + 1. 

Following standard risk-management practice the value V; is modelled as a func- 


tion of time and a d-dimensional random vector Z; = (Z;,1,..., Zt,d) of risk fac- 
tors, i.e. we have the representation 
V, = f(t, Zr) (2.2) 


for some measurable function f : R} x R? — R. Risk factors are usually assumed 
to be observable so that Z; is known at time t. The choice of the risk factors and 
of f is of course a modelling issue and depends on the portfolio at hand and on 
the desired level of precision. Frequently used risk factors are logarithmic prices 
of financial assets, yields and logarithmic exchange rates. A representation of the 
portfolio value in the form (2.2) is termed a mapping of risks. Some examples of 
the mapping of standard portfolios are provided below. 

It will be convenient to define the series of risk-factor changes (X1)+cn by X; := 
Z, — Z;—1; they are the objects of interest in most statistical studies of financial time 
series. Using the mapping (2.2) the portfolio loss can be written as 


Lii =- (fE + 1, Z + X41) — ft, Z). (2.3) 
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Since Z; is known at time f, the loss distribution is determined by the distribution 
of the risk-factor change X;+1. We therefore introduce another piece of notation, 
namely the loss operator lyy : R? —> R, which maps risk-factor changes into losses. 
It is defined by 


lye) = —G (tt +1,Z;+x)— f(t, Zd), x eR, (2.4) 


and we obviously have Li+1 = [4(X141). 
If f is differentiable, we consider a first-order approximation L 
(2.3) of the form 


A 


i1 of the loss in 


d 
Lia = -( s6, Z) +9 fal, ZX); (2.5) 


i=l 


where the subscripts to f denote partial derivatives. The notation L“ stems from the 
standard delta terminology in the hedging of derivatives (see Example 2.5 below). 
The linearized loss operator corresponding to (2.5) is given by 


d 
1 Œ) := -(4 t, Z) +) fal, Zm); (2.6) 
i=1 

The first-order approximation is convenient as it allows us to represent the loss as 
a linear function of the risk-factor changes. The quality of the approximation (2.5) 
is obviously best if the risk-factor changes are likely to be small (i.e. if we are 
measuring risk over a short horizon) and if the portfolio value is almost linear in the 
risk factors (i.e. if the function f has small second derivatives). 


Remark 2.2. In developing formulas (2.2)-(2.6) we have assumed that time is 
measured in units of the horizon A. In order to be in line with market convention 
in our examples it will sometimes be convenient to consider mappings of the form 
g(s, Z), where time s is measured in years; in that case, equations (2.2) and (2.3) 
become, respectively, V; = f(t, Z;) = g(tA, Z;) and 


Lint = (8 (0 + DA, Z + Xr41) — 80A, Z;)), 


where A gives the length of the risk-management horizon in years. Care must be 
taken with the linearized version of the loss in (2.5), which becomes 


d 
Eei = -(ssa, Z) A + > 87, (tA, Zev) (2.7) 
i=l 
Note that, when working with a short time horizon A, the term gs (tA, Z;) A in (2.7) 
is very small and is therefore often dropped in practice. 


Remark 2.3. Note that our definition of the portfolio loss implicitly assumes that 
the composition of the portfolio remains unchanged over the time horizon A. While 
unproblematic for daily losses this assumption becomes increasingly unrealistic for 
longer time horizons. This is a problem for non-financial corporations like insurance 
companies; such companies may prefer to measure the risk of their financial portfolio 
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over a one-year horizon, which is the appropriate horizon for dealing with their usual 
business risks. We note also that in the context of the Basel Accords, discussed in 
Chapter 1, it is formally required that calculations for banks be made under the 
assumption that the portfolio composition remains unchanged over a holding period 
A (10 days in the case of market risk). 


2.1.2 Conditional and Unconditional Loss Distribution 


As mentioned earlier, in risk management we often have to decide if we are interested 
in the conditional or the unconditional distribution of losses. Both are relevant for 
risk-management purposes, but it is important to be aware of the distinction between 
the two concepts. 

The differences between conditional and unconditional loss distributions are 
strongly related to time series properties of the series of risk-factor changes (X;);en.- 
Suppose that the risk-factor changes form a stationary time series with stationary 
distribution Fx on R¢. Essentially, this means that the distribution of (X;);en is 
invariant under shifts of time (see Chapter 4 for details) and most time series models 
used in practice for the modelling of risk-factor changes satisfy this property. Now 
fix a point in time ¢f (current time), and denote by F; the sigma field representing 
the publicly available information at time t. Typically, F, = o({X; : s < t}), the 
sigma field generated by past and present risk-factor changes, often called the his- 
tory, up to and including time t. Denote by Fy,,_,|, the conditional distribution of 
X741 given current information ¥;. In most stationary time series models relevant 
for risk management, Fy,, |, is not equal to the stationary distribution Fy. An 
important example is provided by the popular models from the GARCH family (see 
Section 4.3). In this class of model the variance of the conditional distribution of 
Xı+ı is a function of past risk-factor changes and possibly of its own lagged values. 
On the other hand, if (X;),;en is an independent and identically distributed (iid) 
series, we obviously have F'y,, ,|¢, = Fx. 

Fix the loss operator /[;] corresponding to the portfolio currently under consider- 
ation. The conditional loss distribution F,, ,\¥, is defined as the distribution of the 
loss operator lj (-) under Fy,, ,|¢,. Formally we have, for/ € R, 


Fi iF O = P Ua (Xi) SLI Fr) = PLi SL | Fr), 


i.e. the conditional loss distribution gives the conditional distribution of the loss L744 
in the next time period given current information ¥;. Conditional distributions are 
particularly relevant in market risk management. 

The unconditional loss-distribution Fy,,, on the other hand is defined as the 
distribution of /[;](-) under the stationary distribution Fy of risk-factor changes. It 
gives the distribution of the portfolio loss if we consider a generic risk-factor change 
X with the same distribution as X1, ..., X;. The unconditional loss distribution is 
of particular interest if the time horizon over which we want to measure our losses 
is relatively large, as is frequently the case in credit risk management and insurance. 

To define conditional and unconditional distributions of linearized losses we sim- 
ply replace ljr] by ir Of course, if the risk-factor changes form an iid sequence, 
conditional and unconditional loss distributions coincide. 
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Risk-management techniques based on the conditional loss distribution are often 
termed conditional or dynamic risk management; techniques based on the uncon- 
ditional loss distribution are often referred to as static risk management. In Sec- 
tion 2.3.6 we illustrate the difference between the two approaches. 


2.1.3 Mapping of Risks: Some Examples 


We now consider a number of examples from the area of market and credit risk, 
illustrating how typical risk-management problems fit into the framework of the pre- 
vious section. Altogether there are five examples in this section. While we strongly 
encourage the reader to study at least two or three of them to develop intuition for 
the mapping of risks, it is possible to skip some of the examples at first reading. 


Example 2.4 (stock portfolio). Consider a fixed portfolio of d stocks and denote 
by A; the number of shares of stock i in the portfolio at time t. The price process 
of stock i is denoted by (Sz, i)ren. Following standard practice in finance and risk 
management we use logarithmic prices as risk factors, i.e. we take Z; ; := In Sri, 
1 <i < d. The risk-factor changes X;41,; = In S;+1,; — In S;,; then correspond 
to the log-returns of the stocks in the portfolio. We get V, = ye Aj exp(Z;,;) and 
hence 


d 
Litt = =V — Vi) = — D0 Ai Sri (exp(X41,1) — 1). 


isi 


The linearized loss LÂ , is then given by 


t+1 
d d 
Lea =— ÈO NSi Xii =V: DD wi Xi, (2.8) 
i=l i=l 
where the weight wri := (AiS:,i)/V gives the proportion of the portfolio value 
invested in stock i at time t. The corresponding linearized loss operator is given 
by ln (x) = —V,wix := —V; an wr ixi. Given the mean vector and covariance 


matrix of the distribution of the risk-factor changes it is very easy to compute the first 
two moments of the distribution of the linearized loss L“. Suppose that the random 
vector X follows a distribution with mean vector u and covariance matrix X. Using 
general rules for the mean and variance of linear combinations of a random vector 
(see also equations (3.7) and (3.8)) we immediately get 


EAX) =—V,w'p and vall (X)) = Vfw Dw. (2.9) 


Applied to the mean vector mf; and the covariance matrix X, of the conditional 
distribution Fy,,,|¢, of the risk-factor changes, (2.9) yields the first two moments 
of the conditional loss distribution; applied to the mean vector m and the covariance 
matrix X of the unconditional distribution Fy of the risk-factor changes, (2.9) yields 
the first two moments of the unconditional loss distribution. 


Example 2.5 (European call option). We now consider a simple example of a 
portfolio of derivative securities, namely a standard European call on a non-dividend- 
paying stock S with maturity date T and exercise price K . We use the Black-Scholes 


30 2. Basic Concepts in Risk Management 


option-pricing formula for the valuation of our portfolio. Define the function CBS 
by 


CBS(s, S:r,0, K, T) := S® (d1) — Ke T o (dz), (2.10) 


where ® denotes the standard normal distribution function (df), r represents the 
continuously compounded risk-free interest rate, o denotes the annualized volatility 
of the underlying stock, and where 


5 In(S/K) + (r + 50°)(T — s) 
25 o/T—s 
Following market convention, time in (2.10) is measured in years so that Remark 2.2 
applies. We are interested in daily losses and set A = 1/250. 

An obvious risk factor to choose for this portfolio is the log-price of the under- 
lying stock. While in the Black-Scholes option-pricing model the interest rate and 
volatility are assumed to be constant, in real markets interest rates change con- 
stantly as do the implied volatilities that practitioners tend to use as inputs for the 
volatility parameter. Hence we take Z; = (In S;, r+, 0)’ as the vector of risk factors. 
According to the Black-Scholes formula the value of the call option on day t equals 
V, = CBS (tA, St; rt, or, K, T). The risk-factor changes are given by 


and dy=dj)—-oVT—s. (2.11) 


X41 = (n S1 — In Sp, re41 — Fi, O41 — Ot), 
so that the linearized loss is given by 
Li = —(CPSA + CESS: Xe41,1 + CP Xe41,2 + CPS Xr41,3)5 


where the subscripts denote partial derivatives. Note that we have omitted the argu- 
ments of CPS to simplify the notation. The derivatives of the Black-Scholes option- 
pricing function are often referred to as the Greeks: C BS (the partial derivative 
with respect to the stock price S) is called the delta of the option; ce (the partial 
derivative with respect to calendar time s) is called the theta of the option; CPS (the 
partial derivative with respect to the interest rate r) is called the rho of the option; 
in a slight abuse of the Greek language, GBS (the partial derivative with respect to 
volatility ø) is called the vega of the option. The Greeks play an important role in 
the risk management of derivative portfolios. 

The reader should keep in mind that for portfolios with derivatives the linearized 
loss can be a rather poor approximation of the true loss, since the portfolio value is 
often a highly nonlinear function of the risk factors. This has led to the development 
of higher-order approximations such as the delta-gamma approximation, where 
first- and second-order derivatives are used. (The second derivative ce is called 
the gamma of an option.) In Notes and Comments we provide a number of further 
references dealing with this issue. 


Example 2.6 (bond portfolio). Next we consider a portfolio of d default-free zero- 
coupon bonds with maturity T; and price p(s, T;), 1 <i < d (again time is measured 
in years, so Remark 2.2 applies). By A; we denote the number of bonds with maturity 
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T; in the portfolio. While zero-coupon bonds of longer maturities are relatively rare 
in practice, our example is relevant, since many fixed-income instruments such as 
coupon bonds or standard swaps can be viewed as portfolios of zero-coupon bonds. 

We follow a standard convention in modern interest-rate theory and normalize 
the face value p(T, T) of the bond to one. Recall that the continuously compounded 
yield of a zero-coupon bond is defined as y(s, T) := —(1/(T — s))ln p(s, T), 
i.e. we have 

p(s, T) = exp(—(T — s)y(s, T)). 


The mapping T — y(s, T) is referred to as the continuously compounded yield 
curve at time s. In a detailed analysis of the change in value of a bond portfolio one 
takes all yields y(s, 7;), 1 < i < d, as risk factors. The value of the portfolio at 
time s is then given by V(s) = a dj p(s, T;), and in our mapping notation (2.2) 
we have 
d d 
Vi = Do Aip(tA, Ti) = X ài exp (T; — tA)y(tA, Ti). (2.12) 
i=l i=l 
From this formula the loss L;+1 is easily computed. Taking derivatives and using 
the definition of the linearized loss Lr in (2.5) we also get 


d 
La =— > Aap, TOCA, TA — (T; — tA)Xr41,i), (2.13) 
i=l 
where the risk-factor changes are X;41,; = y((t + LA, T;) — y(ta, Ti). 

This formula is closely related to the classical concept of duration. Suppose 
that the yield curve is flat, i.e. y(s, T) = y(s) independently of T and that the 
only possible changes in interest rates are parallel shifts of the yield curve so that 
y(s + A, T) = y(s) + 6 for all T. These assumptions are clearly unrealistic but 
frequently made in practice. Then L = , can be written as 


d 
di p(t, T;) 
tA = -V(x >D iP V, —(T; 1A)8) = -V 0A ~ DB), 
i=1 
where 7 
àip(tA, T; 
paS yA) 


V; 
i=l f 


is a weighted sum of the times to maturity of the different cash flows in the portfolio, 
the weights being proportional to the discounted value of the cash flows. D is usually 
called the duration of a bond portfolio. The duration is an important tool in traditional 
bond-portfolio or asset-liability management. The standard duration-based strategy 
to manage the interest risk of a bond portfolio is called immunization. Under this 
strategy an asset manager, who has a certain amount of funds to invest in various 
bonds and who needs to make certain known payments in the future, allocates these 
funds to various bonds in such a way that the duration of the overall portfolio 
consisting of bond investment and liabilities is equal to zero. As we have just seen, 
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duration measures the sensitivity of the portfolio value with respect to parallel shifts 
of the yield curve. Hence a zero duration means that the position has been immunized 
against these type of yield-curve changes. However, the portfolio is still exposed to 
other types of yield-curve changes. 

If we consider large portfolios of fixed-income instruments, such as the overall 
fixed-income position of a major bank, choosing the yield of every bond in the port- 
folio as a risk factor becomes impractical: one ends up with too many risk factors, 
which renders the estimation of the distribution of the risk-factor changes impossible. 
To overcome this problem one therefore picks a few benchmark yields per coun- 
try and uses a more-or-less ad hoc procedure to map cash flows at days between 
benchmark points to the two nearest benchmark points. We refer to Section 6.2 of 
the RiskMetrics technical document (JPMorgan 1996) for details. 


Example 2.7 (currency forwards). We now consider the mapping of a long position 
in a currency forward. A currency or foreign exchange (FX) forward is an agreement 
between two parties to buy/sell a pre-specified amount V of foreign currency at a 
future time point T > s at a pre-specified exchange rate e. The future buyer is said 
to hold a long position, the other party is said to hold a short position in the contract. 

To map this position we use the fact that a long position in the forward is equivalent 
to a long position in a foreign and a short position in a domestic zero-coupon 
bond. For illustration think of a euro investor holding a long position of size V in 
a currency forward on the USD/euro exchange rate. Denote by p(s, T) the USD 
price of an American (foreign) zero-coupon bond and by p“(s, T) the corresponding 
euro (domestic) zero-coupon bond; the USD/euro spot exchange rate is denoted by 
e(s). Then the value in euro at time T of a portfolio consisting of Ay := V foreign 
and à2 := —éV domestic zero-coupon bonds equals Vr = Vier — e), which is 
obviously equal to the pay-off of the long position in the forward. 

The short position in the domestic zero-coupon bond can be dealt with as in 
Example 2.6. Hence it remains to consider the position in the American zero-coupon 
bond. Obvious risk factors to choose are the logarithmic exchange rate and the yield 
of the US zero-coupon bond, i.e. Z; = (Ine;, y (tA, T))’. The value of the position 
in the foreign bond then equals 


V, = Vexp(Z;,1 — (T — tA)Z,,2), 


and we get 


LAY = —Vi(Z,24 + Xia — (T — t)X141,2), 


where as usual X,+; represents the risk-factor changes. 


Example 2.8 (stylized portfolio of risky loans). In our final example, which comes 
from the area of credit risk management, we show how losses from a portfolio 
of loans fit into our general framework; a detailed discussion of models for loan 
portfolios will be presented in Chapter 8. A loan portfolio is subject to many risks. 
The most important ones are default risk, i.e. the risk that some counterparties cannot 
repay their loans; interest-rate risk, i.e. the risk that the present value of the future 
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cash flows from the portfolio is diminished due to rising interest rates; and, finally, 
the risk of losses caused by rising credit spreads. 

We consider a portfolio of loans to m different counterparties; the size of the 
exposure to counterparty i is denoted by e;. Following standard practice in credit 
risk management, our risk-management horizon A is taken to be one year so that 
there is no need to distinguish between the two timescales (i.e. t and s). For simplicity 
we assume that all loans are repaid at the same date T > t and that there are no 
payments prior to T. We introduce an rv Y, ; that represents the default state of 
counterparty i at t; we set Y, ; = 1 if counterparty i defaults in the time period [0, t] 
and Y;; = 0 otherwise. Again for simplicity we assume a recovery rate of zero, 
i.e. we assume that upon default of obligor i the whole exposure e; is lost. 

In valuing a risky loan we have to take the possibility of default into account. 
Typically, this is done by discounting the cash flow e; at a higher rate than the yield 
y(t, T) of a default-free zero-coupon bond. More precisely, we model the value at 
time f of such a loan as 


exp(—(T — t)(yt, T) + cit, T)))ei: 


ci(t, T) is then referred to as the credit spread of company i corresponding to the 
maturity T. Credit spreads are often determined from the prices of traded corporate 
bonds issued by companies with a similar credit quality to the counterparty under 
consideration. Alternatively, a formal pricing model using, for instance, the mar- 
ket value of the counterparty’s stock as main input can be used (see Chapter 8, in 
particular Section 8.2, for more information). Again for simplicity we ignore vari- 
ations in credit quality and assume that c; (t, T) = c(t, T) for all i. Under all these 
simplifying assumptions the value at time ¢ of our loan portfolio equals 


V, = 5a — Y, i) exp(-(T — HOt, T) + clt, T))ei. (2.14) 


i=1 
This suggests the following (m + 2)-dimensional random vector of risk factors 
Zi = (Y,1,---5 Yims y(t, T), c(t, T))’. (2.15) 


L;41 and ljg are now easy to compute using (2.14). Due to the discrete nature 
of the default indicators and the long time horizon, linearized losses are of little 
importance in credit risk management. It is apparent from (2.14) and (2.15) that the 
main difficulty in modelling the loss distribution of loan portfolios is in finding and 
calibrating a good model for the joint distribution of the default indicators Y;+1,;, 
1 <i <™m; this issue is taken up in Chapter 8. 


Notes and Comments 


The framework introduced in this section is a stylized version of the model intro- 
duced by the RiskMetrics Group. A summary of the earlier work of the RiskMet- 
rics Group is the RiskMetrics Technical Document (JPMorgan 1996); an excellent 
updated summary, which also discusses some recent developments on the academic 
side, is Mina and Xiao (2001). The mapping of positions is also discussed in Jorion 
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(2001) and Dowd (1998). The differences between conditional and unconditional 
risk management are highlighted in McNeil and Frey (2000). 

While not very satisfactory from a theoretical point of view, duration-based hedg- 
ing remains popular with practitioners. For a detailed discussion of duration and its 
use in the management of interest-rate risk we refer the reader to standard finance 
textbooks such as Jarrow and Turnbull (1999) or Hull (1997). 

The mapping of derivative portfolios using first- and second-order approximations 
to the portfolio value (the so-called delta-gamma approximation) is discussed in 
Duffie and Pan (1997) and Rouvinez (1997) (see also Duffie and Pan 2001). 


2.2 Risk Measurement 


In this section we give an overview of existing approaches to measuring risk in 
financial institutions. In discussing strengths and weaknesses of these approaches 
we focus on practical aspects and postpone a proper discussion of the theoretical 
properties of the risk measures (issues such as subadditivity and coherence) until 
Chapter 6. 

In practice risk measures are used for a variety of purposes. Among the most 
important are the following. 


Determination of risk capital and capital adequacy. As discussed in Chapter 1, 
one of the principal functions of risk management in the financial sector is to 
determine the amount of capital a financial institution needs to hold as a buffer 
against unexpected future losses on its portfolio in order to satisfy a regulator, who 
is concerned with the solvency of the institution. A related problem is the deter- 
mination of appropriate margin requirements for investors trading at an organized 
exchange, which is typically done by the clearing house of the exchange. 


Management tool. Risk measures are often used by management as a tool for 
limiting the amount of risk a unit within a firm may take. For instance, traders in 
a bank are often constrained by the rule that the daily 95% Value-at-Risk of their 
position should not exceed a given bound. 


Insurance premiums. Insurance premiums compensate an insurance company for 
bearing the risk of the insured claims. The size of this compensation can be viewed 
as a measure of the risk of these claims. 


2.2.1 Approaches to Risk Measurement 


Existing approaches to measuring the risk of a financial position can be grouped into 
four different categories: the notional-amount approach; factor-sensitivity measures; 
risk measures based on the loss distribution; risk measures based on scenarios. 


Notional-amount approach. This is the oldest approach to quantifying the risk of 
a portfolio of risky assets. In the notional-amount approach the risk of a portfolio is 
defined as the sum of the notional values of the individual securities in the portfolio, 
where each notional value may be weighted by a factor representing an assessment 
of the riskiness of the broad asset class to which the security belongs. Variants of 
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this approach are still in use in the standardized approach of the Basel Committee 
rules on banking regulation; see, for example, Section 10.1.2 for operational risk, 
or Chapter 2 of Crouhy, Galai and Mark (2001). 

The advantage of the notional-amount approach is its apparent simplicity. How- 
ever, as we recall from Chapter 1, from an economic viewpoint the approach is 
flawed for a number of reasons. To begin with, the approach does not differenti- 
ate between long and short positions and there is no netting. For instance, the risk 
of a long position in foreign currency hedged by an offsetting short position in a 
currency forward would be counted as twice the risk of the unhedged currency posi- 
tion. Moreover, the approach does not reflect the benefits of diversification on the 
overall risk of the portfolio. For example, if we use the notional-amount approach, 
it appears that a well-diversified credit portfolio consisting of loans to m companies 
that default more or less independently has the same risk as a portfolio where the 
whole amount is lent to a single company. Finally, the notional-amount approach 
has problems in dealing with portfolios of derivatives, where the notional amount of 
the underlying and the economic value of the derivative position can differ widely. 


Factor-sensitivity measures. Factor-sensitivity measures give the change in port- 
folio value for a given predetermined change in one of the underlying risk factors; 
typically they take the form of a derivative (in the calculus sense). Important factor- 
sensitivity measures are the duration for bond portfolios and the Greeks for portfolios 
of derivatives. While these measures provide useful information about the robust- 
ness of the portfolio value with respect to certain well-defined events, they cannot 
measure the overall riskiness of a position. Moreover, factor-sensitivity measures 
create problems in the aggregation of risks. 


e Fora given portfolio it is not possible to aggregate the sensitivity with respect 
to changes in different risk factors. For instance, it makes no sense to simply 
add the delta and the vega of a portfolio of options. 


e Factor-sensitivity measures cannot be aggregated across markets to create a 
picture of the overall riskiness of the portfolio of a financial institution. 


Hence these measures are not very useful for capital-adequacy decisions; used in 
conjunction with other measures they can be useful for setting position limits. 


Risk measures based on loss distributions. Most modern measures of the risk in 
a portfolio are statistical quantities describing the conditional or unconditional loss 
distribution of the portfolio over some predetermined horizon A. Examples include 
the variance, the Value-at-Risk and the expected shortfall, which we discuss in more 
detail in Sections 2.2.2—2.2.4. It is of course problematic to rely on any one particular 
statistic to summarize the risk contained in a distribution. However, the view that 
the loss distribution as a whole gives an accurate picture of the risk in a portfolio 
has much to commend it: 


e losses are the central object of interest in risk management and so it is natural 
to base a measure of risk on their distribution; 
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e the concept of a loss distribution makes sense on all levels of aggregation 
from a portfolio consisting of a single instrument to the overall position of a 
financial institution; 


e if estimated properly, the loss distribution reflects netting and diversification 
effects; and, finally, 


e loss distributions can be compared across portfolios. 


For instance, it makes perfect sense to compare the loss distribution of a book of 
fixed-income instruments and of a portfolio of equity derivatives, at least if the time 
horizon A is the same in both cases. 

There are two major problems when working with loss distributions. First, any 
estimate of the loss distribution is based on past data. If the laws governing financial 
markets change, these past data are of limited use in predicting future risk. The 
second, related problem is practical. Even in a stationary environment it is difficult to 
estimate the loss distribution accurately, particularly for large portfolios, and many 
seemingly sophisticated risk-management systems are based on relatively crude 
statistical models for the loss distribution (incorporating, for example, untenable 
assumptions of normality). 

However, this is not an argument against using loss distributions. Rather, it calls 
for improvements in the way loss distributions are estimated and, of course, for 
prudence in the practical application of risk-management models based on esti- 
mated loss distributions. In particular, risk measures based on the loss distribution 
should be complemented by information from hypothetical scenarios. Moreover, 
forward-looking information reflecting the expectations of market participants, such 
as implied volatilities, should be used in conjunction with statistical estimates (which 
are necessarily based on past information) in calibrating models of the loss distri- 
bution. 


Scenario-based risk measures. In the scenario-based approach to measuring the 
risk of a portfolio one considers a number of possible future risk-factor changes 
(scenarios): such as a 10% rise in key exchange rates or a simultaneous 20% drop 
in major stock market indices or a simultaneous rise of key interest rates around 
the globe. The risk of the portfolio is then measured as the maximum loss of the 
portfolio under all scenarios, where certain extreme scenarios can be downweighted 
to mitigate their effect on the result. 

We now give a formal description. Fix a set X = {x1,..., Xn} of risk-factor 
changes (the scenarios) and a vector w = (wj,..., Wn) € [0, 1]” of weights. 
Consider a portfolio of risky securities and denote by ljr] the corresponding loss 
operator. The risk of this portfolio is then measured as 


PiX, w] = max{wili (x1), --., Walig (Xn)}- (2.16) 


Many risk measures used in practice are of the form (2.16). For instance, the Chicago 
Mercantile Exchange (CME) uses a scenario-based approach to determine margin 
requirements. To compute the initial margin for a simple portfolio consisting of a 
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position in a futures contract and call and put options on this contract, the CME 
considers sixteen different scenarios. The first fourteen consist of an up move or 
a down move of volatility combined with no move, an up move or a down move 
of the futures price by 1/3, 2/3 or 3/3 units of a specified range. The weights wj, 
i = 1,..., 14, of these scenarios are equal to one. In addition there are two extreme 
scenarios with weights wj5 = w16 = 0.35. The amount of capital required by the 
exchange as margin for a portfolio is then computed according to (2.16). 


Remark 2.9. We can give a slightly different mathematical interpretation to for- 
mula (2.16), which will be useful in Section 6.1. Assume for the moment that 
lin (0) = 0, i.e. that the value of the position is unchanged if all risk factors stay 
the same. This is reasonable, at least for a short risk-management horizon A. In that 
case the expression wj/[;](x;) can be viewed as the expected value of lj under a 
probability measure on the space of risk-factor changes; this measure associates a 
mass of w; € [0, 1] to the point x; and a mass of 1 — w; to the point 0. Denote by 
ôx the probability measure associating a mass of one to the point x € R? and by 
Pt x,w] the following set of probability measures on R: 


Pix,w] = {W1ôx, + (1 — wi)d0,..-, Wnôx, + (1 — wn)ôo}. 
Then YiX,w] can be written as 
Wix, w] = max{ E” (la (X)) : P € Pix,w]}. (2.17) 


A risk measure of the form (2.17), where Px, w] is replaced by some arbitrary subset 
P of the set of all probability measures on the space of risk-factor changes, is termed 
a generalized scenario. Generalized scenarios play an important role in the theory 
of coherent risk measures (see Section 6.1). 


Scenario-based risk measures are a very useful risk-management tool for port- 
folios exposed to a relatively small set of risk factors as in the CME example. 
Moreover, they provide useful complementary information to measures based on 
statistics of the loss distribution. The main problem is of course to determine an 
appropriate set of scenarios and weighting factors. Moreover, comparison across 
portfolios which are affected by different risk factors is difficult. 


2.2.2 Value-at-Risk 


Value-at-Risk (VaR) is probably the most widely used risk measure in financial insti- 
tutions and has also made its way into the Basel II capital-adequacy framework— 
hence it merits an extensive discussion. In this chapter we introduce VaR and discuss 
practical issues surrounding its use; in Section 6.1 we examine VaR from the view- 
point of coherent risk measures and highlight certain theoretical deficiencies. 
Consider some portfolio of risky assets and a fixed time horizon A, and denote 
by Fr (l) = P(L < L) the df of the corresponding loss distribution. We do not 
distinguish between L and L“ or between conditional and unconditional loss distri- 
butions; rather we assume that the choice has been made at the outset of the analysis 
and that Fz represents the distribution of interest. We want to define a statistic 
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based on Fz which measures the severity of the risk of holding our portfolio over 
the time period A. An obvious candidate is the maximum possible loss, given by 
inf{/ € R : F,(/) = 1}, a risk measure important in reinsurance. However, in most 
models of interest the support of Fr is unbounded so that the maximum loss is 
simply infinity. Moreover, by using the maximum loss we neglect any probability 
information in Fz. Value-at-Risk is a straightforward extension of maximum loss, 
which takes these criticisms into account. The idea is simply to replace “maximum 
loss” by “maximum loss which is not exceeded with a given high probability”, the 
so-called confidence level. 


Definition 2.10 (Value-at-Risk). Given some confidence level a € (0, 1). The VaR 
of our portfolio at the confidence level a is given by the smallest number / such that 
the probability that the loss L exceeds / is no larger than (1 — a). Formally, 


VaRy = inffle R: P(L>1) < 1—a}=inf{le R: FL(D 2 a}. (2.18) 


In probabilistic terms, VaR is thus simply a quantile of the loss distribution. 
Typical values for œ are a = 0.95 or a = 0.99; in market risk management the 
time horizon A is usually 1 or 10 days, in credit risk management and operational 
risk management A is usually one year. Note that by its very definition the VaR at 
confidence level a does not give any information about the severity of losses which 
occur with a probability less than 1 — a. This is clearly a drawback of VaR as a risk 
measure. For a small case study that illustrates this problem numerically we refer 
to Example 2.21 below. 

Figure 2.1 illustrates the notion of VaR. The probability density function of a loss 
distribution is shown with a vertical line at the value of the 95% VaR. Note that the 
mean loss is negative (E(L) = —2.6), indicating that we expect to make a profit, 
but the right tail of the loss distribution is quite long in comparison with the left tail. 
The 95% VaR value is approximately 2.2, indicating that there is a 5% chance that 
we lose at least this amount. 


Remark 2.11 (mean-VaR). Denote by u the mean of the loss distribution. Some- 
times the statistic VaRv*" := VaRy —j is used for capital-adequacy purposes 
instead of ordinary VaR. If the time horizon A equals one day, VaR?"**" is some- 
times referred to as daily earnings at risk. The distinction between ordinary VaR and 
VaRi°" is of little relevance in market risk management, where the time horizon is 
short and u is close to zero. It becomes relevant in credit where the risk-management 
horizon is longer. In particular, in loan pricing one uses VaR™**" to determine the 
economic capital needed as a buffer against unexpected losses in a loan portfolio 
(see Section 9.3.4 for details). Taking the expectation of the P&L distribution into 
account is also important in the growing field of asset-management risk. 


Since quantiles play an important role in risk management we recall the precise 
definition. 


2.2. Risk Measurement 39 


0.25 


: E(L) VaR |ES 
0.20 4 
D 
& 0.154 
D 
E 0.10- 
© 
A 
0.05 + 
5% probability 
0 — 
T T T T T 
10 5 0 5 10 
Loss 


Figure 2.1. An example of a loss distribution with the 95% VaR marked as a vertical line; 
the mean loss is shown with a dotted line and an alternative risk measure known as the 95% 
expected shortfall (see Section 2.2.4 and Definition 2.15) is marked with a dashed line. 


Definition 2.12 (generalized inverse and quantile function). 


(i) Given some increasing function T : R —> R, the generalized inverse of T is 
defined by T€ (y) := inf{x € R : T(x) 2 y}, where we use the convention 
that the infimum of an empty set is oo. 


(ii) Given some df F, the generalized inverse F ~ is called the quantile function 
of F. For a € (0, 1) the w-quantile of F is given by 


da(F) := F€ (a) = inf{x € R: F(x) >a}. 


For an rv X with df F we often use the alternative notation qa (X) := qa (F). If 
F is continuous and strictly increasing, we simply have gy(F) = F~!(a), where 
F—' is the ordinary inverse of F. To compute quantiles in more general cases we 
may use the following simple criterion. 


Lemma 2.13. A point xo € R is the w-quantile of some df F if and only if the 
following two conditions are satisfied: F (xo) > a; F(x) < «œ forallx < xo. 


The lemma follows immediately from the definition of the generalized inverse 
and the right-continuity of F. Examples for the computation of quantiles in certain 
tricky cases and further properties of generalized inverses are given in Section A.1.2. 


Example 2.14 (VaR for normal and ¢ loss distributions). Suppose that the loss 
distribution F7, is normal with mean u and variance o°. Fix a € (0, 1). Then 


VaRy = u +087! (æ) and VaR™" = o`! (a), (2.19) 


where ® denotes the standard normal df and #7! (œ) is the a-quantile of ®. The 
proof is easy: since Fz is strictly increasing, by Lemma 2.13 we only have to show 
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that Fz (VaRy) = a. Now 


L-pu 


P(L < VaRa) = P( < o-ta) = ($7! (a)) =a. 

This result is routinely used in the variance—covariance approach (also known as 
the delta-normal approach) to computing risk measures, which we describe in Sec- 
tion 2.3.1 below; if we work with linearized losses, and assume that our risk-factor 
changes are multivariate normal, then the resulting loss distribution is normal, and 
we can compute VaR using (2.19). 

Of course a similar result is obtained for any location-scale family and another 
useful example is the Student ż loss distribution. Suppose our loss L is such that 
(L — w)/o has a standard ż distribution with v degrees of freedom; we denote this 
loss model by L ~ t(v, m, o?) and note that the moments are given by E(L) = u 
and var(L) = vo? /(v — 2) when v > 2, so that o is not the standard deviation of 
the distribution. We get 

VaRy = u + ot, ! (œ), (2.20) 


where t, denotes the df of standard t, which is available in most statistical computer 
packages along with its inverse. 


2.2.3 Further Comments on VaR 


Non-subadditivity. VaR has been fundamentally criticized as a risk measure on the 
grounds that is has poor aggregation properties. This critique has its origins in the 
work of Artzner et al. (1997, 1999), who showed that VaR is not a coherent risk 
measure, since it violates the property of subadditivity that they believe reasonable 
risk measures should have. 

We devote Section 6.1 to an in-depth discussion of this subject. At this point 
we merely remark that non-subadditivity means that if we have two loss distribu- 
tions Fz, and Fz, for two portfolios and we denote the overall loss distribution 
of the merged portfolio L = Lı + L2 by Fz, we do not necessarily have that 
qa(FL) S da(FL,) + qa(FL,), so that the VaR of the merged portfolio is not nec- 
essarily bounded above by the sum of the VaRs of the individual portfolios. This 
contradicts our notion that there should be a diversification benefit associated with 
merging the portfolios; it also means that a decentralization of risk management 
using VaR is difficult since we cannot be sure that by aggregating VaR numbers for 
different portfolios or business units we will obtain a bound for the overall risk of 
the enterprise. 


Model risk and market liquidity. In practice VaR numbers are often given a very 
literal interpretation, which is misleading and even dangerous; the statement that 
the daily VaR at confidence level a = 99% of a particular portfolio is equal to / is 
understood as “with a probability of 99% the loss on this position will be smaller 
than 7”. 

This interpretation is misleading for two reasons. To begin with, our estimate of 
the loss distribution is subject to estimation error and the problem of model risk. 
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Model risk can be defined as the risk that a financial institution incurs losses because 
its risk-management models are misspecified or because some of the assumptions 
underlying these models are not met in practice. For instance, we might work with 
a normal distribution to model losses whereas the real distribution is heavy-tailed, 
or we might fail to recognize the presence of volatility clustering or tail dependence 
(see Section 4.1.1) in modelling the distribution of the risk-factor changes. Since 
“any financial model is by definition a simplified and thus imperfect representation 
of the economic world and the ways in which economic agents perform investment, 
trading or financing decisions” (from the introduction of Gibson (2000)), it is fair 
to say that any risk-management model is subject to model risk to some extent. 
Of course, these problems are most pronounced if we try to estimate VaR at very 
high confidence levels such as a = 99.97%, as we might be required to do in the 
determination of economic capital (see Section 1.4.3). 

Moreover, the above interpretation of VaR neglects any problems related to market 
liquidity. Loosely speaking, a market for a security is termed liquid if investors can 
buy or sell large amounts of the security in a short time without affecting its price 
very much. Conversely, a market in which an attempt to trade has a large impact 
on price, or where trading is impossible since there is no counterparty willing to 
take the other side of the trade, is termed illiquid. The problem this poses for the 
interpretation of VaR numbers was brought to the attention of professional risk 
managers by Lawrence and Robinson (1995). To quote from their paper: 


If we ask the question: “Can we be 98% confident that no more than an 
amount / [the VaR estimate at a = 0.98] would be lost in liquidating 
the position?” the answer must be “no”. To see why, consider what this 
measure of VaR implies about the risk management process and the 
nature of financial markets. In the liquidation scenario we are consid- 
ering the following sequence of events is implied: at time f it is decided 
to liquidate the position; during the next 24 hours nothing is done ...; 
after 24 hours of inaction the position is liquidated at prices which are 
drawn from a [pre-specified] distribution unaffected by the process of 
liquidation. This scenario is hardly credible. ... In particular, the act 
of liquidating itself would have the effect of moving the price against 
the trader disposing of a long position or closing out a short position. 
For large positions and illiquid instruments the costs of liquidation can 
be significant, in particular if speed is required. 


They conclude that “any useful measures of VaR must take into account the costs 
of liquidation on the prospective loss”. The events surrounding the near-bankruptcy 
of the hedge fund LTCM in Summer 1998 clearly showed that these concerns are 
more than justified. In fact, illiquidity of markets is nowadays regarded by many 
risk managers as the most important source of model risk. 

Ideally we should try to factor the effects of market illiquidity into formal models, 
although this is difficult for a number of reasons. First, the price impact of trading a 
particular amount of a security at a given point in time is hard to measure; it depends 
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on such elusive factors as market mood or the distribution of economic information 
among investors. Moreover, in illiquid markets traders are forced to close their 
position gradually over time to minimize the price impact of their transactions. 
Obviously, this liquidation has to be done on a different timescale depending on the 
size of the position to be liquidated relative to the market. This in turn would lead to 
different time horizons A for different positions, rendering the aggregation of risk 
measures across portfolios impossible. In many practical situations the risk manager 
can therefore do no better than ignore the effect of market liquidity in computing 
VaR numbers or related risk measures and be aware of the ensuing problems in 
interpreting the results. See Notes and Comments for pointers to further theoretical 
and empirical studies of these issues. 


Choice of VaR parameters. Whenever we work with risk measures based on the 
loss distribution we have to choose an appropriate horizon A; in the case of VaR we 
also have to decide on the confidence level a. There is of course no single optimal 
value for these parameters, but there are some considerations which might influence 
our choice. 

The risk-management horizon A should reflect the time period over which a 
financial institution is committed to hold its portfolio. This period is affected by 
contractual and legal constraints, and by liquidity considerations. It will typically 
vary across markets; in choosing a horizon for enterprise-wide risk management, a 
financial institution or corporation has little choice but to use the horizon appropri- 
ate for the market in which its core business activities lie. For instance, insurance 
companies are typically bound to hold their portfolio of claims for one year, say; in 
this time they can neither alter the portfolio by a substantial amount nor renegotiate 
the premiums they receive. Hence, in firm-wide risk management, one year is also 
the appropriate horizon for measuring the market risk of the investment portfolios 
of such companies. 

As mentioned earlier, even in the absence of contractual constraints, a financial 
institution can be forced to hold a loss-making position in a risky asset if the market 
for that asset is not very liquid. For such positions a relatively long horizon may be 
appropriate. Again, liquidity does vary across markets, and for overall risk manage- 
ment an institution has to choose a horizon which best reflects its main exposures. 

There are other, more practical considerations which suggest that A should be 
relatively small: the use of the linearized loss operator, which simplifies many com- 
putations, is justified only if the risk-factor changes are relatively small, which is 
more likely for small A. Similarly, the assumption that the composition of the port- 
folio remains unchanged is tenable only for a small holding period. Moreover, the 
calibration and testing of statistical models for the risk-factor changes (X;);en are 
easier if A is small, since this typically means that we have more data at our disposal. 

Concerning the choice of the confidence level œ it is again difficult to give a 
clear-cut recommendation, since different values of œ are appropriate for differ- 
ent purposes. Fortunately, once we have an estimate for the loss distribution, it is 
easy to compute quantiles at different confidence levels simultaneously. For capital- 
adequacy purposes a high confidence level is certainly called for in order to have a 
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sufficient safety margin. For instance, the Basel Committee proposes the use of VaR 
at the 99% level and A equal to 10 days for market risk. In order to set limits for 
traders, a bank would typically take 95% and A equal to one day. The backtesting 
of models producing VaR figures should also be carried out at lower confidence 
levels in order to have more observations where the realized loss is higher than the 
predicted VaR. 


Transforming VaR into regulatory capital. For banks using the internal model (IM) 
approach for market risk (MR), the following risk-capital formula results: 


60 
RC! (MR) = max | VaRG iO, = XO VaR l | + Csr, (2.21) 
i=l 
where VaR IS stands for a 10-day VaR at the 99% confidence level, calculated 
on day j, and f represents today. The stress factor 3 < k < 4 is determined as 
a function of the overall quality of the bank’s internal model. The component Csr 
stands for specific risk, i.e. the risk that is due to issuer-specific price movements after 
accounting for general market factors. A specific risk component should be added 
to all VaR numbers (see, for example, Crouhy, Galai and Mark 2001, Section 2.2). 


A comment on VaR terminology. In practice the term “VaR” is used in various 
ways. In its most narrow sense the term Value-at-Risk refers to a quantile of the 
loss distribution as defined in Definition 2.10. Often risk managers refer to “VaR 
procedures” such as “delta-normal VaR” (see Section 2.3.1 below). A VaR procedure 
refers to a statistical approach to estimating a model for the loss distribution. Clearly, 
a VaR procedure could also be used to estimate some other risk measure based on the 
loss distribution. Finally, the term “VaR approach to risk management” is frequently 
used and usually refers to the way VaR figures are used in steering a company. In 
this book we use the term VaR only in the first sense. 


2.2.4 Other Risk Measures Based on Loss Distributions 


The purpose of this section is to discuss a few other statistical summaries of the 
loss distribution which are frequently used as risk measures in finance, insurance 
and risk management. As in the previous two sections we assume that a certain loss 
distribution Fz has been fixed at the outset of the analysis. 


Variance. Historically the variance of the P&L distribution has been the dominat- 
ing risk measure in finance. To a large extent this is due to the huge impact that the 
portfolio theory of Markowitz, which uses variance as a measure of risk, has had 
on theory and practice in finance (see, for example, Markowitz 1952). Variance is a 
well-understood concept which is easy to use analytically. However, as a risk mea- 
sure it has two drawbacks. On the technical side, if we want to work with variance, 
we have to assume that the second moment of the loss distribution exists. While 
unproblematic for most return distributions in finance this can cause problems in 
certain areas of non-life insurance or for the analysis of operational losses (see Sec- 
tion 10.1.4). On the conceptual side, since it makes no distinction between positive 
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and negative deviations from the mean, variance is a good measure of risk only for 
distributions which are (approximately) symmetric around the mean, such as the 
normal distribution or a (finite-variance) Student f distribution. However, in many 
areas of risk management, such as in credit and operational risk management, we 
deal with loss distributions which are highly skewed. 


Lower and upper partial moments. Partial moments are measures of risk based on 
the lower or upper part of a distribution. In most of the literature on risk management 
the main concern is with the risk inherent in the lower tail of a P&L distribution and 
lower partial moments are used to measure this risk. Under our sign convention we 
are concerned with the risk inherent in the upper tail of a loss distribution and we 
focus on upper partial moments. Given an exponent k > 0 and a reference point q 
the upper partial moment UPM(k, q) is defined as 


[0.0] 
UPM (k, q4) = / (L — q) dFL(D) € [0, ov]. (2.22) 
q 

Some combinations of k and q have a special interpretation: for k = 0 we obtain 
P(L > q); for k = 1 we obtain E((L — q)liL>q}); for k = 2 and q = E(L) we 
obtain the upper semivariance of L. Of course, the higher the value we choose for 
k, the more conservative our risk measure becomes since we give more and more 
weight to large deviations from the reference point q. 


Expected shortfall. Expected shortfall is closely related to VaR. It is now preferred 
to VaR by many risk managers in practice and will be seen in Section 6.1 to overcome 
the conceptual deficiencies of the latter (related to subadditivity). 


Definition 2.15 (expected shortfall). For a loss L with E(|L|) < œ and df Fz the 
expected shortfall at confidence level a € (0, 1) is defined as 


1 1 
ES m / qu (FL) du, (2.23) 
l-a Ja 


where qu (FL) = F D (u) is the quantile function of Fz. 


Expected shortfall is thus related to VaR by 


1 1 
ESy = =S VaR, (L) du. 
l-a@ Jy 


Instead of fixing a particular confidence level a we average VaR over all levels 
u > a and thus “look further into the tail” of the loss distribution. Obviously ES« 
depends only on the distribution of L and obviously ESg > VaRq. See Figure 2.1 
for a simple illustration of an expected shortfall value and its relationship to VaR. 
The 95% expected shortfall value of 4.9 is at least double the 95% VaR value of 2.2 
in this case. 

For continuous loss distributions an even more intuitive expression can be derived 
which shows that expected shortfall can be interpreted as the expected loss that is 
incurred in the event that VaR is exceeded. 
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Lemma 2.16. For an integrable loss L with continuous df F; and anya € (0, 1) 
we have 


_ E(L; L > qa(L)) 


ES, = = E(L | L 2 VaRy), (2.24) 


l-a 
where we have used the notation E(X; A) := E(XI,) for a generic integrable rv 
X and a generic set A € F. 


Proof. Denote by U an rv with uniform distribution on the interval [0, 1]. It is 
a well-known fact from elementary probability theory that the rv Fý (U) has df 
Fçņr (see Proposition 5.2 for a proof). We have to show that E(L; L > qa(L)) = 
[LFS (u) du. Now 


E(L; L > qa(L)) = E(F{ (U); Ff (U) > Ff @)) = E(F{ (U); U > a); 
in the last equality we used the fact that F$ is strictly increasing since Fz is contin- 
uous (see Proposition A.3(iii)). Thus we get E(F$ (U); U > a) = J} Fy (u) du. 


The second representation follows since for a continuous loss distribution Fr, we 
have P(L > qg(L)) = 1-a. 


Remark 2.17. For a discontinuous loss df Fz, formula (2.24) does not hold for 
all æ; instead we have the more complicated expression 


ESq = (EUs L > qa) + qa(l —a— P(L > qa))). (2.25) 
—-aa 


For a proof see Proposition 3.2 of Acerbi and Tasche (2002). 


We use Lemma 2.16 to calculate the expected shortfall for two common contin- 
uous distributions. 


Example 2.18 (expected shortfall for Gaussian loss distribution). Suppose that 
the loss distribution Fz is normal with mean u and variance o°. Fix a € (0, 1). 
Then 

5 o(®'(a)) 


ESyg = u + 
l-a 


> (2.26) 


where ¢ is the density of the standard normal distribution. The proof is elementary. 
First note that 


E= D= L= 
ES, =u +0 E E > aol £)) 
o 


o 


hence it suffices to compute the expected shortfall for the standard normal rv 
L := (L — u)/o. Here we get 


p 1 o  _ (@7'(@)) 


ES« (L) = lID dl = 7i Ols- = a 


1— a Jo- (a) — a l-a 


Example 2.19 (expected shortfall for Student ¢ loss distribution). Suppose the 
loss L is such that L = (L — p) /o has a standard ¢ distribution with v degrees of 
freedom, as in Example 2.14. Suppose further that v > 1. By the reasoning of Exam- 
ple 2.18, which applies to any location-scale family, we have ESy = u + o ESq G2); 
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The expected shortfall of the standard ¢ distribution is easily calculated by direct 
integration to be 


-1 -1 (g))2 
ESa (È) = 8&6; o (rte (@)) ), 


2.27 
1— v-—1 ( ) 


where t, denotes the df and g, the density of standard t. 


The following lemma gives a kind of law of large numbers for expected shortfall 
in terms of order statistics. 


Lemma 2.20. For a sequence (Li)iey of iid rvs with df F; we have 


Diet Livn 
n>% [n(1 —a)] 


=ESy as. (2.28) 


Where Li n > --- > Ly,» are the order statistics of L4, ..., Ln and where [n(1—«a@)] 
denotes the largest integer not exceeding n(1 — a). 


In other words, expected shortfall at confidence level œ can be thought of as 
the limiting average of the [n(1 — œ)] upper order statistics from a sample of size n 
from the loss distribution. This representation suggests an obvious way of estimating 
expected shortfall in the situation when we have large samples and [n(1 — @)] is a 
relatively large number. This is generally not the case in practice, except perhaps 
when the Monte Carlo approach to risk estimation is used (see Section 2.3.3). A 
proof of Lemma 2.20 may be found in Proposition 4.1 of Acerbi and Tasche (2002). 

Since ES, can be thought of as an average over all losses that are greater than 
or equal to VaRg, it is sensitive to the severity of losses exceeding VaR,. This 
advantage of expected shortfall is illustrated in the following example. 


Example 2.21 (VaR and ES for stock returns). We consider daily losses on a 
position in a particular stock; the current value of the position equals V; = 10000. 
Recall from Example 2.4 that the loss for this portfolio is given by L a = —-V,Xi41, 
where X;+1 represents daily log-returns of the stock. We assume that X,+; has mean 
zero and standard deviation o = 0.2/ /250, i.e. we assume that the stock has an 
annualized volatility of 20%. We compare two different models for the distribution, 
namely (i) a normal distribution and (ii) a ¢ distribution with v = 4 degrees of 
freedom scaled to have standard deviation o. The ¢f distribution is a symmetric 
distribution with heavy tails, so that large absolute values are much more probable 
than in the normal model; it is also a distribution that has been shown to fit well in 
many empirical studies (see Example 3.15). In Table 2.1 we present VaR, and ES, 
for both models and various values of a. In case (i) these values have been computed 
using (2.26); the expected shortfall for the £ model has been computed using (2.27). 

Most risk managers would argue that the t model is riskier than the normal model, 
since under the ¢ distribution large losses are more likely. However, if we use VaR at 
the 95% or 97.5% confidence level to measure risk, the normal distribution appears 
to be at least as risky as the ¢ model; only above a confidence level of 99% does 
the higher risk in the tails of the t model become apparent. On the other hand, if 
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Table 2.1. VaRg and ES, in normal and t model for different values of a. 


a 0.90 0.95 0.975 0.99 0.995 
VaRg (normal model) 162.1 208.1 247.9 294.3 325.8 
VaRq (t model) 137.1 190.7 248.3 335.1 411.8 
ESq (normal model) 222.0 260.9 295.7 337.2 365.8 
ESq (t model) 223.4 286.3 356.7 465.8 563.5 


we use expected shortfall, the risk in the tails of the t model is reflected in our risk 
measurement for lower values of œ. Of course, simply going to a 99% confidence 
level in quoting VaR numbers does not help to overcome this deficiency of VaR, as 
there are other examples where the higher risk becomes apparent only for confidence 
levels beyond 99%. 


Remark 2.22. It is possible to derive results on the asymptotics of the shortfall- 
to-quantile ratio ES« / VaRq for a — 1. For the normal distribution we have 
limy-+1 ES« / VaRg = 1; for the ¢ distribution with v > 1 degrees of freedom 
we have limg-+1 ESo / VaRg = v/(v — 1) > 1. This shows that for a heavy-tailed 
distribution the difference between ES and VaR is more pronounced than for the 
normal distribution. We will take up this issue in more detail in Section 7.2.3. 


Notes and Comments 


An extensive discussion of different approaches to risk quantification is given in 
Crouhy, Galai and Mark (2001). Value-at-Risk was introduced by JPMorgan in the 
first version of its RiskMetrics system and was quickly accepted by risk managers 
and regulators as industry standard. Expected shortfall was made popular by Artzner 
et al. (1997, 1999). In the latter paper an important axiomatic approach to risk mea- 
sures was developed; we will discuss their work in Section 6.1. There are a number 
of variants on the expected shortfall risk measure with a variety of names, such as 
tail conditional expectation (TCE), worst conditional expectation (WCE) and condi- 
tional VaR (CVaR); all coincide for continuous loss distributions. Acerbi and Tasche 
(2002) discuss the relationships between the various notions. Risk measures based 
on loss distributions also appear in the literature under the (somewhat unfortunate) 
heading of law-invariant risk measures. 

A class of risk measures very much in use throughout the hedge fund industry is 
based on the peak-to-bottom loss over a given period of time in the performance curve 
of an investment. These measures are typically referred to as (maximal) drawdown 
risk measures (see, for example, Chekhlov, Uryasev and Zabarankin 2005; Jaeger 
2005). 

The measurement of financial risk and the computation of actuarial premiums are 
at least conceptually closely related problems, so that the actuarial literature on pre- 
mium principles is of relevance in financial risk management. We refer to Chapter 3 
of Rolski et al. (1999) for an overview; Goovaerts, De Vylder and Haezendonck 
(1984) provide a specialist account. 
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Model risk has become a central issue in modern risk management. The essays 
collected in Gibson (2000) give a good overview of the academic research on this 
issue. The problems faced by the hedge fund LTCM in 1998 provide a prime example 
of model risk in VaR-based risk-management systems. While LTCM had a seemingly 
sophisticated VaR system in place, errors in parameter estimation, unexpectedly 
large market moves (heavy tails) and in particular vanishing market liquidity drove 
the hedge fund into near-bankruptcy, causing major financial turbulence around the 
globe. Jorion (2000) contains an excellent discussion of the LTCM case, in particular 
comparing a Gaussian-based VaR model with a t-based approach. At a more general 
level, Jorion (2002b) discusses the various fallacies surrounding VaR-based market 
risk management systems. 

Most of the academic literature on liquidity focuses on the determinants of bid— 
ask spreads and/or transaction cost (see, for example, the survey by Stoll (2000)). 
Risk-management and hedging issues in illiquid markets have received relatively 
little attention: optimal strategies for unwinding a position in an illiquid market 
are discussed in Almgren and Chriss (2001); the hedging of derivatives in illiquid 
markets has been studied by Jarrow (1994), Frey (1998, 2000), Schonbucher and 
Wilmott (2000) and Bank and Baum (2004), among others. 


2.3 Standard Methods for Market Risks 


In the following sections we discuss some standard methods used in the financial 
industry for measuring market risk over short time intervals, such as a day or a 
fortnight. In the formal framework of Section 2.1.1 this amounts to the problem 
of estimating risk measures for the loss distribution of a loss L741 = li (Xr+1), 
where X;+, is the vector of risk-factor changes from time f to time t + 1 and liy 
is the loss operator based on the portfolio at time t; the risk measures on which we 
concentrate are VaR (Definition 2.10) and expected shortfall (Definition 2.15). We 
recall from Section 2.1.2 that the issue of whether we base our risk measurement on 
the unconditional loss distribution of L;+1 or the conditional loss distribution based 
on information denoted by F; is relevant. In presenting the standard methods we 
clarify which of these approaches is generally adopted. 


2.3.1 Variance—Covariance Method 


We present a generic version of this method which may be turned into an uncondi- 
tional or conditional method by varying the procedure that is used to estimate certain 
key inputs. The risk-factor changes X;+ 1 are assumed to have a multivariate normal 
distribution (either unconditionally or conditionally) denoted by X;41 ~ Na(u, X), 
where p is the mean vector and » the covariance (or variance—covariance) matrix of 
the distribution. The properties of the multivariate normal distribution are discussed 
in detail in Section 3.1.3. 

We assume that the linearized loss in terms of the risk factors is a sufficiently 
accurate approximation of the actual loss and simplify the problem by considering 
the distribution of LA, = ln (X;+1) with ln defined in (2.6). The linearized loss 
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operator will be a function with general structure 
Line) = —(cr + bx) (2.29) 


for some constant c; and constant vector b;, which are known to us at time ¢. For 
a concrete example, consider the stock portfolio of Example 2.4 where the loss 
operator takes the form in (x) = —V,w}x and w; is the vector of portfolio weights 
at time t. 

An important property of the multivariate normal is that a linear function (2.29) 
of X;+ ; must have a univariate normal distribution. From general rules on the mean 
and variance of linear combinations of a random vector we obtain that 


LA, = AX) ~ Ne — bim, b, Xb). (2.30) 


Value-at-Risk may be easily calculated for this loss distribution using (2.19) in 
Example 2.14. Expected shortfall may be calculated using (2.26) in Example 2.18. 

To turn this into a practical procedure we require estimates of p and X based on 
historical risk-factor change data X;_n+1,..., X;. If we simply estimate u and X by 
calculating the sample mean vector and sample covariance matrix, then this amounts 
to an analysis of the unconditional loss distribution under the tacit assumption that 
the risk-factor change data come from a stationary process. The standard sample 
estimates of mean and covariance are reviewed in Section 3.1.2. 

To obtain a conditional version of this method we treat the data as a realization 
of a multivariate time series and assume that X;+1 | Ft ~ Na(Mt+1, X41), where 
+41 and X;+1 now denote the conditional mean and covariance matrix given infor- 
mation to time ¢. We obtain estimates of these moments for substitution in (2.30) 
by forecasting. This might involve the formal estimation of a time series model, 
such as a multivariate GARCH model, and the use of model-based prediction meth- 
ods. Alternatively, a more informal forecasting technique, such as the exponentially 
weighted moving-average (EWMA) procedure popularized in JPMorgan’s Risk- 
Metrics, might be used. The use of these techniques is discussed in greater detail in 
Chapter 4. 


Weaknesses of the method. The variance—covariance method offers a simple ana- 
lytical solution to the risk-measurement problem but this convenience is achieved at 
the cost of two crude simplifying assumptions. First, linearization may not always 
offer a good approximation of the relationship between the true loss distribution and 
the risk-factor changes, as discussed at the end of Section 2.1.1. Second, the assump- 
tion of normality is unlikely to be realistic for the distribution of the risk-factor 
changes, certainly for daily data and probably also for weekly and even monthly data. 
A stylized fact of empirical finance suggests that the distribution of financial risk 
factor returns is leptokurtic and heavier-tailed than the Gaussian distribution. Later, 
in Example 3.3, we present evidence for this observation in an analysis of daily, 
weekly, monthly and quarterly stock returns. The implication is that an assumption 
of Gaussian risk factors will tend to underestimate the tail of the loss distribution 
and measures of risk, like VaR and expected shortfall, that are based on this tail. 
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This criticism also applies to the conditional version of the variance—covariance 
method. Even when we attempt an explicit time series modelling of the return 
data, analyses mostly suggest that the conditional distribution of risk-factor changes 
for the next time period, given information up to the present, is not multivariate 
Gaussian, but rather a distribution whose margins have heavier tails. Another way 
of putting this is to say that the innovation distribution of the time series model is 
generally heavier-tailed than normal (see Example 4.24). 


Extensions of the method. The convenience of the method relies on the fact that a 
linear combination of a multivariate Gaussian vector has a univariate Gaussian dis- 
tribution. However, there are other multivariate distribution families that are closed 
under linear operations, and variance-covariance methods can also be developed 
for these. Examples include multivariate t distributions and multivariate generalized 
hyperbolic distributions, which we describe in detail in Chapter 3. 

For example, suppose we model risk-factor changes (either unconditionally or 
conditionally) with a multivariate ¢ distribution denoted X;41 ~ ta(v, p, X), where 
this notation is explained in Example 3.7 and Section 3.4. Then the analogous 
expression to (2.30) is 


LA = AX) ~ tv, cy — bip, b, Žb,), (2.31) 


and risk measures can be calculated using (2.20) and (2.27). 


2.3.2 Historical Simulation 


Instead of estimating the distribution of L = lir (X:+1) under some explicit para- 
metric model for X;+1, the historical-simulation method can be thought of as esti- 
mating the distribution of the loss operator under the empirical distribution of data 
Xj—n+1,---,X;. The method can be concisely described using the loss-operator 
notation; we construct a univariate dataset by applying the operator to each of our 
historical observations of the risk-factor change vector to get a set of historically 
simulated losses: 


{Ls = (Xs) is =t—n+,..., th. (2.32) 


The values L; show what would happen to the current portfolio if the risk-factor 
changes on day s were to recur. We make inference about the loss distribution and 
risk measures using these historically simulated loss data. 

This is an unconditional method. If we assume that the process of risk-factor 
changes is stationary with df Fy, then (subject to further technical conditions) the 
empirical df of the data is a consistent estimator of Fy. Hence the empirical df of the 
data Lioni bee L, is a consistent estimator of the df of lp] (X) under Fy. More 
formally, an appropriate version of the strong law of large numbers for time series 
can be used to show that, as n —> oo, 

1 t 1 t 
Fa (l) := z 5 E< = = > DX) 
s=t—n+1 s=t—n+1 

> Pla) <D = FLO, 
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where X is a generic vector of risk-factor changes with distribution Fy and 
L := liy (X). 

In practice there are various ways we can use the historically simulated loss 
data. It is common to estimate VaR using the method of empirical quantile estima- 
tion, whereby theoretical quantiles of the loss distribution are estimated by sam- 
ple quantiles of the data. If we denote the ordered values of the data in (2.32) 
by Eny Keg Lin one possible estimator of VaRg(L) is Lina-o],n> where 
[n(1 — a@)] denotes the largest integer not exceeding n(1 — a). For example, if 
n = 1000 and a = 0.99, we would estimate the VaR by taking the 10th largest 
value. To estimate the associated expected shortfall an obvious empirical estimator 
following from the representation (2.28) would be the average of the 10 largest 
losses. As an alternative, particularly in situations where n is relatively modest in 
size, we could fit a parametric univariate distribution to the data (2.32) and calculate 
risk measures analytically from this distribution. 


Strengths and weaknesses of the method. The historical-simulation method has 
obvious attractions: it is easy to implement and reduces the risk-measure estimation 
problem to a one-dimensional problem; no statistical estimation of the multivariate 
distribution of X is necessary, and no assumptions about the dependence structure 
of risk-factor changes are made. 

However, the success of the approach is highly dependent on our ability to collect 
sufficient quantities of relevant, synchronized data for all risk factors. Whenever 
there are gaps in the risk-factor history, or whenever new risk factors are introduced 
into the modelling, there may be problems filling the gaps and completing the 
historical record. These problems will tend to reduce the effective value of n and 
mean that empirical estimates of VaR and expected shortfall have very poor accuracy. 
Ideally we want n to be fairly large since the method is an unconditional method 
and we want a number of extreme scenarios in the historical record to provide more 
informative estimates of the tail of the loss distribution. Indeed the method has 
been referred to as “driving a car while looking through the rear view mirror”; this 
obvious deficiency, which is shared by all purely statistical procedures, could be 
compensated for by adding historical extreme events to the available database or by 
formulating relevant extreme scenarios. 

The fact that the method is an unconditional method could be seen as a further 
weakness; we have remarked in Section 2.1.2 that the conditional approach is gen- 
erally considered to be the more relevant for day-to-day market risk management. 


Extensions of the method. Simple empirical estimates of the VaR and especially 
the expected shortfall are likely to be inaccurate, particularly in situations where n 
is of modest size (say only a few years of daily data). Moreover, the approach of 
fitting parametric univariate distributions to the historically simulated losses may 
not result in models that provide a particularly good fit in the tail area where our 
risk-measure estimates are calculated. A possible solution to this problem is to use 
the techniques of extreme value theory (EVT) to provide estimates of the tail of 
the loss distribution that are as faithful as possible to the most extreme data and 
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that use parametric forms that are supported by theory. In Chapter 7 we describe a 
standard EVT method based on the generalized Pareto distribution that is useful in 
this context. 

It is possible to develop conditional approaches based on the basic template of 
historical simulation. One simple approach might be to model the historically sim- 
ulated data in (2.32) with a univariate time series and to use this model to calculate 
conditional estimates for the loss L741 = ljr (X:+1). Formally speaking, this is not 
quite the conditional approach as we have previously defined it: we do not consider 
the conditional distribution of L;+1 conditional on F;, the sigma field generated by 
(X;)s<;, butrather conditional on, say, $,, the sigma field generated by (/(;)(X5))s<i; 
which is a less rich information set. In practice, however, this simple method may 
often work well. See Notes and Comments for references to another conditional 
version of historical simulation. 


2.3.3 Monte Carlo 


The Monte Carlo method is a rather general name for any approach to risk mea- 
surement that involves the simulation of an explicit parametric model for risk-factor 
changes. As such, the method can be either conditional or unconditional depending 
on whether the model adopted is a dynamic time series model for risk-factor changes 
or a Static distributional model. 

The first step of the method is the choice of the model and the calibration of this 
model to historical risk-factor change data X;-n+1,..., X;. Obviously it should be 
a model from which we can readily simulate, since in the second stage we generate 
m independent realizations of risk-factor changes for the next time period, which 
we denote by Xe KDS y., 

In a similar fashion to the historical-simulation method, we apply the loss operator 
to these simulated vectors to obtain simulated realizations o = ly (X a D: 
i=1,...,m)} from the loss distribution. These simulated loss data are used to 
estimate risk measures; very often this is done by simple empirical quantile and 
shortfall estimation, as described above, but it would again be possible to base the 
inference on fitted univariate distributions or to use an extreme value model to model 
the tail of the simulated losses. Note that the use of Monte Carlo means that we are 
free to choose the number of replications m ourselves, within the obvious constraints 
of computation time. Generally m can be chosen to be much larger than n so that 
we obtain more accuracy in empirical VaR and expected shortfall estimates than is 
possible in the case of historical simulation. 


Weaknesses of the method. The method does not solve the problem of finding a 
multivariate model for X,+; and any results that are obtained will only be as good 
as the model that is used. In a market risk context a dynamic model seems desirable 
and some kind of GARCH structure with a heavy-tailed multivariate conditional 
distribution, such as multivariate t, might be considered. The models we describe 
in Section 4.6 provide possible candidates. 

For large portfolios the computational cost of the Monte Carlo approach can be 
considerable, as every simulation requires the revaluation of the portfolio. This is 
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particularly problematic if the portfolio contains many derivatives which cannot be 
priced in closed form. Variance-reduction techniques such as importance sampling 
can be of help here. We discuss the application of importance sampling in models 
for credit risk management in Section 8.5; further references on variance-reduction 
techniques are given in Notes and Comments. 


2.3.4 Losses over Several Periods and Scaling 


So far we have considered one-period loss distributions and associated risk mea- 
sures. It is often the case that we would like to infer risk measures for the loss 
distribution over several periods from a model for single-period losses. For exam- 
ple, suppose that we work with a model for daily risk-factor changes which is set up 
to allow calculations of a daily VaR and expected shortfall. We might want to also 
obtain estimates of VaR and expected shortfall for the one-week or one-month loss 
distribution assuming that the portfolio is held constant throughout that time. 

An obvious approach is to aggregate daily risk-factor change data in order to obtain 
risk-factor change data at a lower frequency and to make a one-period estimation 
using these data. Clearly, this results in a reduction in the number of data and 
necessitates a new analysis of the aggregated data. The former problem can be 
avoided by the formation of overlapping risk-factor returns (a construction that is 
described in Section 4.1) but this is not really recommended as it introduces new 
serial dependencies into the data that greatly complicate statistical modelling. 


Scaling. It would be far more attractive if we had simple rules for transforming 
one-period risk measures into h-period risk ao forh > 1. Suppose we denote 
the loss from time t over the next h periods by L” Arguing as in (2.1) and (2.3) 
we have 


LO, = (Vign — V) = (f(t +h, Zan) — f(t, Zo) 
=—(f(t +h, Zi + Xaa ++ Xin) — f(t Z)) 


(Soma), 


i=l 


ae 


where / o represents a loss operator at time ¢ for the h-period loss. TE enen 
question of interest is how risk measures applied to the distribution of L” fe 3 scale 
with h, and this has no simple answer except in special cases. 

Note that the -period loss operator differs from the one-period loss operator in 
situations where the mapping depends explicitly on time (such as derivative port- 
folios). For simplicity let us consider the case in which the mapping does not depend 
on calendar time, so that i [A (x)= = lin (x). The linearized form of this operator will 
be lit A œ) = = b; x for some vector b, which is known at time t. We look at the simpler 
A of scaling for risk measures applied to the linearized loss distribution: 


h 
LOS EIA F 2 Xs) =X b Xni. (2.33) 
i=l 
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The following example shows a special case where we do have a very simple scaling, 
known as the square-root-of-time rule. 


Example 2.23 (square-root-of-time scaling). Suppose the risk-factor change vec- 
tors are iid with distribution N4 (0, X). Then sy Xt+i ~ Na(0,hS’) and the 
distribution of se in (2.33) satisfies (both conditionally and unconditionally) 
LOS ~ N(O, hb, Zb;). It then follows easily from (2.19) and (2.26) that both quan- 
tiles and expected shortfalls for this distribution scale according to the square root 


of time (vh). For example, writing ESY” for the expected shortfall, we have 


@ —1 
where o? = bi Xb;. Clearly, ES? =h ES{P and, with similar notation, VaR = 
Vh VaR”). 


This scaling rule is quite commonly used in practice and is easily implemented 
in the context of the variance—covariance method. However, empirical risk-factor 
change data generally support neither a Gaussian distributional assumption nor an 
iid assumption. It is a stylized fact of empirical finance that, although financial 
risk-factor changes possess low serial correlation, they show patterns of changing 
volatility that are not consistent with an iid model (see Section 4.1). To obtain rea- 
sonable models for risk-factor change data we require dynamic time series models, 
such as models from the GARCH family. However, relatively little is known about 
the scaling of risk measures under such models. When considering the distribution of 
the h-period loss L ee (or its linearized form) we have to be aware that the scaling of 
risk measures applied to this distribution will also depend on whether we consider 
the unconditional distribution or the conditional distribution given ¥;. Very little 
theory exists for either question but empirical studies suggest that the true scaling 
can be very different from square-root-of-time scaling (see Notes and Comments 
for more on this). 


Monte Carlo approach. Itis possible to use a Monte Carlo approach to the problem 
of determining risk measures for the h-period loss distribution. Suppose we have 
a model for risk-factor changes, either distributional or dynamic, depending on 
whether we are performing an unconditional or conditional analysis. _ 

In the dynamic case we simulate future paths of the process K aX Q) 
for j = 1,...,m, where m is a predetermined large number of replications. (In 
the unconditional case we would simply simulate realizations from a multivariate 
distribution.) We then apply the h-period loss operator to these simulated data to 
obtain Monte Carlo simulated losses: 

LO =A Fe $ XO)) i= 1... mh. 

These are used to make statistical inference about the loss distribution and associated 
risk measures, as described in Section 2.3.3. 
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2.3.5  Backtesting 


In the preceding sections we have considered standard methods for estimating risk 
measures at a time ¢ for the distribution of losses in the next period. When this 
procedure is continually implemented over time we have the opportunity to monitor 
the performance of methods and compare their relative performance. This process 
of monitoring is known as backtesting. 

Suppose that at time t we make estimates of both VaR and expected shortfall for 
one period and h periods. We denote the true one-period risk measures by VaR‘, 
and ES$, and the true h-period measures by VaR‘;" and ES$”. These may be uncon- 
ditional or conditional risk measures, but for the purposes of this section we leave 
this unspecified. At time tf + 1 we have the opportunity to compare our one-period 
estimates with what actually happened; at time t + h we have the opportunity to do 
the same for the h-period estimates. 

By definition of VaR (and assuming a continuous loss distribution) we have that 
P(Lishn > VaR‘) = | — aso that the probability of a so-called violation of VaR is 
1 — æ. In practice the risk measures have to be estimated from data and we introduce 
an indicator notation for violations of the VaR estimates: 


KORNE, 


Îi =I ~ : Pay 
+1 (L> VaR, P t+h (Li, > VaR") 


(2.34) 
We expect that if our estimation method is reasonable then these indicators should 
behave like Bernoulli random variables with success (i.e. violation) probability close 
to (1 — æ). If we conduct multiple comparisons of VaR predictions and correspond- 
ing realized losses, then we expect the proportion of occasions on which VaR is 
violated to be about | — a. 

In more specific situations we can say more. For example, if we form one-step- 
ahead estimates of a conditional one-period VaR using a dynamic approach, then 
we expect that the violation indicators i in (2.34) should behave like iid Bernoulli 
rvs with expectation (1 — œ); the number of violations over m time periods should 
be binomial with expected value m(1 — a). This will be discussed in more detail in 
Section 4.4.3. 

We would also like to be able to backtest the success of our expected shortfall 
estimation. Considering, for simplicity, the one-period expected shortfall estimate, 
it follows from Lemma 2.16 that for a continuous loss distribution the identity 


E((Lr41 — ES) 1(1,4,>vart}) = 0 


is satisfied. This suggests we look at the discrepancy L;+1 — ES) on days when the 
estimated VaR is violated. These should come from a distribution with mean zero. 
Under further modelling assumptions we look at this idea in more detail in Sec- 
tion 4.4.3. 


2.3.6 An Illustrative Example 


We conclude the chapter by giving an example that illustrates some of the ideas 
we have mentioned and which sets the scene for material presented in Chapters 3 
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Figure 2.2. Time series of risk-factor changes. These are log-returns on (a) the FTSE 100, 
(b) the S&P 500 and (c) the SMI indexes, as well as log-returns for (d) the GBP/USD and 
(e) the GBP/CHF exchange rates for the period 1992-2003. 


and 4. We consider the application of methods belonging to the general categories of 
variance—covariance and historical-simulation methods to the portfolio of an investor 
in international equity indexes. The investor is assumed to have domestic currency 
sterling (GBP) and to invest in the Financial Times 100 Shares Index (FTSE 100), the 
Standard & Poor’s 500 (S&P 500) and the Swiss Market Index (SMI). The investor 
thus has currency exposure to US dollars (USD) and Swiss francs (CHF) and the 
value of the portfolio is influenced by five risk factors (three log index values and 
two log exchange rates). The corresponding risk-factor return time series for the 
period 1992-2003 are shown in Figure 2.2. 

On any day tf we standardize the total portfolio value V; in sterling to be one and 
consider that the portfolio weights (the proportions of this total value invested in each 
of the indexes FTSE 100, S&P 500, SMI) are 30%, 40% and 30%, respectively. Using 
similar reasoning to that in Example 2.4, it may be verified that the loss operator is 


lig (Œ) = 1 — (0.3e*! + 0.4e21*4 + 0.307375), 
and its linearized version is 


I(x) = —(0.3x1 + 0.4(x2 + x4) + 0.3003 + x5)), 
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where x1, x2 and x3 represent log-returns on the three indexes and x4 and x5 are 
log-returns on the GBP/USD and GBP/CHF exchange rates. 

Our objective is to calculate VaR estimates at the 95% and 99% levels for all 
trading days in the period 1996-2003. Where local public holidays take place in 
individual markets (e.g. the Fourth of July in the US) we record artificial zero 
returns for the market in question, thus preserving around 260 days of risk-factor 
return data in each year. We use the last 1000 days of historical data X;_999,..., X; 
to make all VaR estimates for day t + 1 with the following methods. 


VC. The standard unconditional variance—covariance method assuming multi- 
variate Gaussian risk-factor changes as described in Section 2.3.1. 


HS. The standard unconditional historical simulation method as described in Sec- 
tion 2.3.2. 


VC-t. An unconditional variance—covariance method in which a multivariate t dis- 
tribution is fitted to the risk-factor change data (see Chapter 3, and Sections 3.2.4 
and 3.2.5 in particular). 


HS-GARCH. A conditional version of the historical simulation method in which 
GARCH(1, 1) models with a constant conditional mean term and Gaussian inno- 
vations are fitted to the historically simulated losses to estimate the volatility of 
the next day’s loss (see Chapter 4, and Section 4.4.2 in particular). 


VC-MGARCH. A conditional version of the variance—covariance method in which 
a multivariate GARCH model (a first-order constant conditional correlation 
model) with multivariate normal innovations is used to estimate the conditional 
covariance matrix of the next day’s risk-factor changes (see Chapter 4, and Sec- 
tion 4.6 in particular). 


HS-EWMaA. A conditional method, similar to HS-GARCH, in which the EWMA 
method rather than a GARCH model is used to estimate volatility (see Sec- 
tions 4.4.1 and 4.4.2). 


VC-EWMA. A similar method to VC-MGARCH but a multivariate version of the 
EWMaA method is used to estimate the conditional covariance matrix of the next 
day’s risk-factor changes (see Section 4.6.6). 


HS-GARCH-?. A similar method to HS-GARCH but Student t¢ innovations are 
assumed in the GARCH model. 


VC-MGARCH-t. A similar method to VC-MGARCH but multivariate t innova- 
tions are used in the MGARCH model. 


HS-CONDEVT. A conditional method using a combination of GARCH modelling 
and EVT (extreme value theory) (see Section 7.2.6). 


This collection of methods is of course far from complete and is merely meant 
as an indication of the kinds of strategies that are possible. In particular, we have 
confined our interest to rather simple GARCH models and not added, for example, 
asymmetric innovation distributions or leverage effects (see Section 4.3.3), which 
can often further improve the performance of such methods. 
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Table 2.2. Numbers of violations of the 95% and 99% VaR estimate calculated using various 
methods, as described in Section 2.3.6. The error column shows for each method the average 
absolute discrepancy per year between observed and expected numbers of violations. 


Year 1996 1997 1998 1999 2000 2001 2002 2003 
Trading days 261 260 259 260 259 260 260 260 error 


Results for 95% VaR 


Expected no. 13 13 13 13 13 13 13 13 

of violations 

VC 13 30 29 15 13 20 27 6 7.88 
HS 14 30 31 16 14 20 26 8 8.12 
VC-t 14 32 35 19 16 23 29 8 10.25 
HS-GARCH 15 17 15 15 14 21 19 11 3.38 
VC-MGARCH 16 19 19 15 15 21 21 12 4.50 
HS-EWMA 16 14 15 15 17 23 18 11 3.62 
VC-EWMA 15 13 15 14 18 22 17 9 3.38 
HS-GARCH-t 16 18 16 15 15 21 19 12 3.75 


VC-MGARCH-t 18 19 19 17 16 23 21 11 5.50 
HS-CONDEVT 14 16 15 15 14 18 18 10 2:19 


Results for 99% VaR 


Expected no. 26 26 26 26 26 26 26 2.6 

of violations 

VC 5 11 20 5 2 6 12 2 5.58 
HS 3 10 13 3 2 3 7 1 3.20 
VC-t 3 11 15 4 2; 4 9 1 4.08 
HS-GARCH 10 7 7 6 5 4 5 3 3.27 
VC-MGARCH 8 8 7 6 3 5 7 4 3.40 
HS-EWMA 9 5 6 6 6 6 3 2 2.92 
VC-EWMA 9 5 6 6 5 5 3 3 2.65 
HS-GARCH-t T 5 5 5 4 3 4 2 1.93 
VC-MGARCH-t 7 5 6 4 2 1 4 1 2.10 
HS-CONDEVT 5 4 5 5 2 2 3 2 1.35 


From the results collected in Table 2.2 we conclude that the three unconditional 
methods (VC, HS and VC-t) are generally outperformed by the conditional methods. 
In particular, the years 1997, 1998 and 2002 are handled poorly by the unconditional 
methods and give rise to too many violations of the 95% and 99% VaR estimates. 
Historical simulation is preferred to variance—covariance at the 99% level but gives 
a poor performance compared with variance—covariance at the 95% level. Basing 
the unconditional variance—covariance method on a multivariate ¢ distribution gives 
an improvement at the 99% level but actually makes things worse at the 95% level. 

The simple univariate GARCH procedures using the historically simulated data 
work quite well; ¢ innovations are preferred to Gaussian innovations at the 99% 
level. The simpler method of volatility estimation using EWMA competes well with 
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Figure 2.3. Daily losses for 2002 together with risk-measure estimates ((a) 95% VaR esti- 
mates, (b) 99% VaR estimates) and violations for the HS and HS-GARCH-t methods. The 
HS VaR estimates are indicated by a solid line and violations are indicated by circles; the 
HS-GARCH-t estimates are given by a dotted line with triangles for violations. For more 
information see Section 2.3.6. 


the full GARCH estimation. The best method of all is HS-CONDEVT, combining 
extreme value theory with GARCH modelling. The multivariate GARCH procedures 
do not offer any improvement on the univariate procedures in this particular example. 

In Figure 2.3 we have singled out the year 2002 and shown actual losses together 
with risk-measure estimates and violations for two of the methods: HS and HS- 
GARCH-t. In this volatile year, the standard historical-simulation method did not 
perform well: there are 26 violations of the 95% VaR estimate and 7 violations of the 
99% VaR estimate, or about twice as many as would be expected. The HS-GARCH-t 
method, being a conditional method, is able to respond to the changes in volatility 
throughout 2002 and consequently gives 19 and 4 violations; this is still a few more 
than expected at the 95% level but is a good performance at the 99% level. 


Notes and Comments 


Standard methods for market risk are described in detail in Jorion (2001) and Crouhy, 
Galai and Mark (2001). For the variance—covariance approach, particularly in a 
dynamic form using EWMA, see Mina and Xiao (2001). 

Another conditional version of historical simulation is used by Hull and White 
(1998) and Barone-Adesi, Bourgoin and Giannopoulos (1998). To describe this 
method succinctly we anticipate some of the notation used in Section 4.6. Suppose 
that we consider a simple model of risk-factor changes of the form X; = A;Z,, 
where A; is a diagonal matrix containing so-called volatilities and the Z; are iid 
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vectors of innovations. We would like to apply historical simulation to the innova- 
tions but these are unobserved. Univariate time series models (typically GARCH 
models) are applied to each time series of risk-factor changes; this effectively gives 
us estimates of the volatility matrices {A, :s=t—n+l1,...,t¢} and allows us to 
make a prediction ere 1 of the volatility matrix in the next time period. We then 
construct residuals Ê, = ÂF! X, :s=t—n+1,...,t}, which are treated like 
observations of the unobserved innovations. To make statistical inference about the 
distribution of L741 = Upq(Xr41) = lia (Ar41Z741) given F; we use the historical- 
simulation data {Ij)(A;41Zs) 8 =t—n+1,...,t}. 

The book by Glasserman (2003a) is an excellent general introduction to sim- 
ulation techniques in finance. Glasserman, Heidelberger and Shahabuddin (1999) 
present efficient numerical techniques (based on delta-gamma approximations and 
advanced simulation techniques) for computing VaR for derivative portfolios in the 
presence of heavy-tailed risk factors. 

A useful summary of scaling results for market risk measures may be found in 
Kaufmann (2004) (see also Brummelhuis and Kaufmann 2004; Embrechts, Kauf- 
mann and Patie 2005). In these papers the message emerges that, for unconditional 
VaR scaling over longer time horizons, the square-root-of-time rule often works well. 
On the other hand, for conditional VaR scaling over short time horizons, McNeil and 
Frey (2000) present evidence against square-root-of-time scaling. For further com- 
ments on these and further scaling issues, see Diebold et al. (1998) and Danielsson 
and de Vries (1997c). Literature on backtesting is given in the Notes and Comments 
section of Section 4.4. 


3 


Multivariate Models 


Financial risk models, whether for market or credit risks, are inherently multivariate. 
The value change of a portfolio of traded instruments over a fixed time horizon 
depends on a random vector of risk-factor changes or returns. The loss incurred by 
a credit portfolio depends on a random vector of losses for the individual counter- 
parties in the portfolio. In this chapter we consider some models for random vectors 
that are particularly useful for financial data. We do this from a static, distributional 
point of view without considering time series aspects, which are introduced later in 
Chapter 4. 

A stochastic model for a random vector can be thought of as simultaneously pro- 
viding probabilistic descriptions of the behaviour of the components of the random 
vector and of their dependence or correlation structure. The issue of modelling 
dependent risk factors is by no means straightforward, particularly when we move 
away from the multivariate normal distribution and simple generalizations thereof. 
We provide a more in-depth discussion of some of the subtler issues surrounding 
dependence in Chapter 5, where we introduce the subject of copulas. 

The first section of this chapter reviews basic ideas in multivariate statistics and 
discusses the multivariate normal (or Gaussian) distribution and its deficiencies as 
a model for empirical return data. 

In Section 3.2 we consider a generalization of the multivariate normal distribution 
known as a multivariate normal mixture distribution, which shares much of the 
structure of the multivariate normal and retains many of its properties. We treat 
both variance mixtures, which belong to the wider class of elliptical distributions, 
and mean-variance mixtures, which allow asymmetry. Concrete examples include 
t distributions and generalized hyperbolic distributions and we show in empirical 
examples that these models provide a better fit than a Gaussian distribution to asset 
return data. In some cases multivariate return data are not strongly asymmetric and 
models from the class of elliptical distributions are good enough; in Section 3.3 we 
review the elegant properties of these distributions. 

In the final section we discuss the important issue of dimension reduction tech- 
niques for reducing large sets of risk factors to smaller subsets of essential risk 
drivers. The key idea here is that of a factor model, and we also review the principal 
components method of constructing factors. 


3.1 Basics of Multivariate Modelling 


This first section reviews important basic material from multivariate statistics, which 
will be known to many readers. The main topic of the section is the multivariate 
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normal distribution and its properties; this distribution is central to much of classical 
multivariate analysis and was the starting point for attempts to model market risk 
(the variance—covariance method of Section 2.3.1). 


3.1.1 Random Vectors and Their Distributions 


Joint and marginal distributions. Consider a general d-dimensional random vector 
of risk-factor changes (or so-called returns) X = (X1, ..., Xq)’. The distribution 
of X is completely described by the joint distribution function (df) 


Fy (x) = Fx (x1, ...,%¢) = P(X <x) = P(X] < x1,..., Xa < Xa). 


Where no ambiguity arises we simply write F, omitting the subscript. 

The marginal distribution function of X;, written Fy, or often simply F;, is the 
df of that risk factor considered individually and is easily calculated from the joint 
df. For all i we have 


Fj (xj) = P(X; < xj) = F(W,..., 00, Xi, ©, ..., 00). (3.1) 


If the marginal df F; (x) is absolutely continuous, then we refer to its derivative fj (x) 
as the marginal density of X;. It is also possible to define k-dimensional marginal 
distributions of X for 2 < k < d — 1. Suppose we partition X into (X{, X5)’, where 
Xı = (X1,..., Xx)! and X2 = (Xx41,..., Xa), then the marginal distribution 
function of X4 is 


Fy, (x1) = P(X) < x1) = F(X, ..., Xk, ©, ..., 0O). 


For bivariate and other low-dimensional margins it is convenient to have a sim- 
pler alternative notation in which, for example, F;;(x;, xj) stands for the marginal 
distribution of the components X; and Xj. 

The df of a random vector X is said to be absolutely continuous if 


x1 Xd 
Fond) = | -f f(u1,..., uq) du1 -+ duq 
—o0 —0o 


for some non-negative function f, which is then known as the joint density of X. 
Note that the existence of a joint density implies the existence of marginal densities 
for all k-dimensional marginals. However, the existence of a joint density is not 
necessarily implied by the existence of marginal densities (counterexamples can be 
found in Chapter 5 on copulas). 

In some situations it is convenient to work with the survival function of X defined 
by 


Fx (x) = Fy(x1,...,%¢) = P(X > x) = P(X, > x1, ..., Xa > Xa), 


and written simply as F when no ambiguity arises. The marginal survival function 
of X;, written Fy; or often simply F;, is given by 


F;(x;) = P(X; > xi) = F(—0o0,..., —00, xi, —00,..., —00). 
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Conditional distributions and independence. If we have a multivariate model for 
risks in the form of a joint df, survival function or density, then we have implicitly 
described their dependence structure. We can make conditional probability state- 
ments about the probability that certain components take certain values given that 
other components take other values. For example, consider again our partition of X 
into (X{, X5)' and assume absolute continuity of the df of X. Let fx, denote the 
joint density of the k-dimensional marginal distribution F'y,. Then the conditional 
distribution of X2 given X; = x, has density 


X2|X1 ¥2 | x1) = ; (3.2) 
f. 2|X1 fx (x1) 
and the corresponding df is 
Fy,|x,(%2 | x1) 
z S POR, wens a L TA 
uk+1=—00 uq=— 00 fx (xı) g l 


If the joint density of X factorizes into f(x) = fx, (x1) fx, (x2), then the con- 
ditional distribution and density of X2 given X; = xı are identical to the marginal 
distribution and density of X2: in other words, X; and X2 are independent. We recall 
that X; and X> are independent if and only if 


F(x) = Fx, (x1)Fx, (x2), Vx, 


or, in the case where X possesses a joint density, f(x) = fx, (x1) fx, (x2). 
The components of X are mutually independent if and only if F (x) = Wt 1Fi (xi) 
for all x € R? or, in the case where X possesses a density, f (x) = ieee fii). 


Moments and characteristic function. The mean vector of X, when it exists, is 
given by 
E(X) := (E(X1),..., E(Xa))’. 


The covariance matrix, when it exists, is the matrix cov(X) defined by 
cov(X) := E((X — E(X))(X — E(X))’), 


where the expectation operator acts componentwise on matrices. If we write X for 
cov(X), then the (i, 7)th element of this matrix is 


oj; = cov(X;, Xj) = E(X; Xj) — E(X))E(X,), 


the ordinary pairwise covariance between X; and Xj. The diagonal elements 
O11,---, Odd are the variances of the components of X. 

The correlation matrix of X, denoted by p(X), can be defined by introducing a 
standardized vector Y such that Y; = X;/./var(X;) for all i and taking p(X) := 
cov(Y). If we write P for p(X), then the (i, j)th element of this matrix is 


cov(X;, Xj) 


/var(X;) var (X; 


Pij = p(Xi, Xj) = (3.3) 
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the ordinary pairwise linear correlation of X; and X ;. To express the relationship 
between correlation and covariance matrices in matrix form it is useful to introduce 
operators on a covariance matrix X as follows: 


A(X) := diag(Vo11, ...54/Odd)s (3.4) 
(2) := (AQ) TPA)". (3.5) 


Thus A(X) extracts from X a diagonal matrix of standard deviations, and (X) 
extracts a correlation matrix. The covariance and correlation matrices X and P of 
X are related by 

P= (2). (3.6) 


Mean vectors and covariance matrices are manipulated extremely easily under 
linear operations on the vector X. For any matrix B € R**¢ and vector b € R‘ we 
have 


E(BX +b) = BE(X) +5), (3.7) 
cov(BX + b) = Bcov(X)B’. (3.8) 


Covariance matrices (and hence correlation matrices) are therefore positive semi- 
definite; writing X for cov(X) we see that (3.8) implies that var(a’X) = a’ Xa > 0 
for any a € R. If we have that a’ Sa > 0 for any a € R? \ {0}, we say that X 
is positive definite; in this case the matrix is invertible. We will make use of the 
well-known Cholesky factorization of positive-definite covariance matrices at many 
points; it is well known that such a matrix can be written as X = AA’ for a lower 
triangular matrix A with positive diagonal elements. The matrix A is known as the 
Cholesky factor. It will be convenient to denote this factor by X 1/2 and its inverse by 
X- !/?, Note that there are other ways of defining the “square root” of a symmetric, 
positive-definite matrix (such as the symmetric decomposition) but we will always 
use X!/? to denote the Cholesky factor. 

In this chapter many properties of the multivariate distribution of a vector X are 
demonstrated using the characteristic function, which is given by 


x(t) = E(exp(it’X)) = E (èX), te R¢. 


3.1.2 Standard Estimators of Covariance and Correlation 


Suppose we have n observations of a d-dimensional risk-factor return vector denoted 
X\,..., Xn. Typically, these would be daily, weekly, monthly or yearly observations 
forming a multivariate time series. We will assume throughout this chapter that the 
observations are identically distributed in the window of observation and either 
independent or at least serially uncorrelated (also known as a multivariate white 
noise). As we discuss in Chapter 4, the assumption of independence may be roughly 
tenable for longer time intervals such as months or years. For shorter time intervals 
independence may be a less appropriate assumption (due to a phenomenon known 
as volatility clustering, discussed in Chapter 4) but serial correlation of returns is 
often quite weak. 
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We assume that the observations X;,..., Xn come from a distribution with mean 
vector p, finite covariance matrix X and correlation matrix P. We now briefly review 
the standard estimators of these vector and matrix parameters. 

Standard method-of-moments estimators of u and X are given by the sample 
mean vector X and the sample covariance matrix S. These are defined by 


Spite he : 2 
X :=- Xi, S:=—) (Xi —X)(X; — XV, (3.9) 
ny ny 
where arithmetic operations on vectors and matrices are performed componentwise. 
X is an unbiased estimator but S is biased; an unbiased version may be obtained by 
taking S,, := nS/(n — 1), as may be seen by calculating 


nO = E( 2 H(X; — pw)! —n(X — p(x w) 


i=l 


n 
= Y cov(X;) —n cov(X) =n — X, 
i=1 

since cov(X¥) = n7! X when the data vectors are iid, or identically distributed and 
uncorrelated. 

The sample correlation matrix R may be easily calculated from the sample covari- 
ance matrix; its (j, k)th element is given by r jx = Sjk/4/SjjSkk, Where s jg denotes 
the (j, k)th element of S. Or, using the notation introduced in (3.5), we have 


R= 9(S), 


which is the analogous equation to (3.6) for estimators. 

Further properties of the estimators X, S and R will depend very much on the true 
multivariate distribution of the observations. These quantities are not necessarily 
the best estimators of the corresponding theoretical quantities in all situations. This 
point is often forgotten in financial risk management, where sample covariance 
and correlation matrices are routinely calculated and interpreted with little critical 
consideration of underlying models. 

If our data X;,..., X, are iid multivariate normal, then X and S are the maximum 
likelihood estimators (MLEs) of the mean vector p and covariance matrix X. Their 
behaviour as estimators is well understood and statistical inference for the model 
parameters is described in all standard texts on multivariate analysis. 

However, the multivariate normal is certainly not a good description of financial 
risk factor returns over short time intervals, such as daily data, and is often not good 
over longer time intervals either. Under these circumstances the behaviour of the 
standard estimators in (3.9) is often less well understood and other estimators of 
the true mean vector u and covariance matrix X may perform better in terms of 
efficiency and robustness. Roughly speaking, by a more efficient estimator we mean 
an estimator with a smaller expected estimation error; by a more robust estimator 
we mean an estimator whose performance is not so susceptible to the presence of 
outlying data values. 
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3.1.3 The Multivariate Normal Distribution 


Definition 3.1. X = (X,,..., Xq) has a multivariate normal or Gaussian distri- 
bution if 

X Č m+AZ, 
where Z = (Z1,..., Zx)' is a vector of iid univariate standard normal rvs (mean 


zero and variance one), and A € R?¢** and we R? are a matrix and a vector of 
constants, respectively. 


It is easy to verify, using (3.7) and (3.8), that the mean vector of this distribution is 
E(X) = mand the covariance matrix is cov(X) = X, where X = AA’ is a positive 
semidefinite matrix. Moreover, using the fact that the characteristic function of 
a standard univariate normal variate Z is z(t) = exp(—3?7), the characteristic 
function of X may be calculated to be 


x(t) = E(exp(it’X)) = exp(it’w — 5t/D0), te R’. (3.10) 


Clearly, the distribution is characterized by its mean vector and covariance matrix, 
and hence a standard notation is X ~ Na(m, X). Note that the components of X are 
mutually independent if and only if X is diagonal. For example, X ~ Nq(0, I4) if 
and only if X1,..., Xq are iid N (0, 1), the standard univariate normal distribution. 

We concentrate on the non-singular case of the multivariate normal when 
rank(A) = d < k. In this case the covariance matrix X has full rank d and is 
therefore invertible (non-singular) and positive definite. Moreover, X has an abso- 
lutely continuous distribution function with joint density given by 


fœ) = exp(—s(@— wy E'(e—p)}, xeR, 61D 


1 
(27) 4/2| | 1/2 


where |X| denotes the determinant of X. 

The form of the density clearly shows that points with equal density lie on ellip- 
soids determined by equations of the form (x — uy X == 4) = c, for constants 
c > 0. In two dimensions the contours of equal density are ellipses, as illustrated 
in Figure 3.1. Whenever a multivariate density f(x) depends on x only through 
the quadratic form (x — uy X y= Į), it is the density of a so-called elliptical 
distribution, as discussed in more detail in Section 3.3. 

Definition 3.1 is essentially a simulation recipe for the multivariate normal dis- 
tribution. To be explicit, if we wished to generate a vector X with distribution 
Na(“, X), where X is positive definite, we would use the following algorithm. 


Algorithm 3.2 (simulation of multivariate normal distribution). 


(1) Perform a Cholesky decomposition of (see, for example, Press et al. 1992) 
to obtain the Cholesky factor X 1/2. 


(2) Generate a vector Z = (Z,,..., Za)’ of independent standard normal vari- 
ates. 


(3) Set X = u + X!P?Z. 
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Figure 3.1. (a) Perspective and contour plots for the density of a bivariate normal distri- 
bution with standard normal margins and correlation —70%. (b) Corresponding plots for a 
bivariate ¢ density with four degrees of freedom (see Example 3.7 for details) and the same 
mean vector and covariance matrix as the normal distribution. Contour lines are plotted at 
the same heights for both densities. 


We now summarize further useful properties of the multivariate normal. These 
properties underline the attractiveness of the multivariate normal for computational 
work in risk management. Note, however, that many of them are in fact shared by 
the broader classes of normal mixture distributions and elliptical distributions (see 
Section 3.3.3 for properties of the latter). 


Linear combinations. If we take linear combinations of multivariate normal ran- 
dom vectors, then these remain multivariate normal. Let X ~ Ng(m, X) and take 
any B € R**4 and b € R*. Then it is easily shown, for example using the charac- 
teristic function (3.10), that 


BX+b~ N,(But+b, BIB’). (3.12) 
As a special case, if a € RI, then 
a'X ~ N(a'p,a' Xa), (3.13) 


and this fact is used routinely in the variance—covariance approach to risk manage- 
ment, as discussed in Section 2.3.1. 

In this context it is interesting to note the following elegant characterization of 
multivariate normality. It is easily shown using characteristic functions that X is 
multivariate normal if and only if a’X is univariate normal for all vectors a € 


R? \ {0}. 
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Marginal distributions. It is clear from (3.13) that the univariate marginal distri- 
butions of X must be univariate normal. More generally, using the X = (X{, X/V 
notation from Section 3.1.1 and extending this notation naturally to u and X, 


_ fe _[%u %2 
u= f X= > 
fo X21 222 


property (3.12) implies that the marginal distributions of X; and X2 are also multi- 
variate normal and are given by X; ~ Ng(M1, X11) and X2 ~ Ng_x (M2, X22). 


Conditional distributions. Assuming that X is positive definite, the conditional 
distributions of X2 given X; and of X; given X2 may also be shown to be multivariate 
normal. For example, X2 | X; = x, ~ Ng_—x (2.1, 22.1), where 


H21 = ha + sid a — Hı) and X21 = Xn — Dy Ty Zn 
are the conditional mean vector and covariance matrix. 
Quadratic forms. If X ~ Na(m, X) with X positive definite, then 
(X — py SOX — w) ~ x7, (3.14) 


a chi-squared distribution with d degrees of freedom. This is seen by observing that 
Z = D7'?(X — u) ~ Na(O, Ia) and (X — p)'D7'(X — p) = Z'Z ~ x3. This 
property (3.14) is useful for checking multivariate normality (see Section 3.1.4). 


Convolutions. If X and Y are independent d-dimensional random vectors satisfy- 
ing X ~ Na( p, X) and Y ~ Nq(jt, X), then we may take the product of charac- 
teristic functions to show that X + Y ~ Na(u+ jf, © + 2D). 


3.1.4 Testing Normality and Multivariate Normality 


We now consider the issue of testing whether the data X;,..., X, are observations 
from a multivariate normal distribution. 


Univariate tests. If X,,..., Xn are iid multivariate normal, then for 1 < j < d 
the univariate sample X1, j, .. ., Xn, j consisting of the observations of the jth com- 
ponent must be iid univariate normal; in fact any univariate sample constructed from 
a linear combination of the data of the form a’ X,,..., a' Xn must be iid univariate 
normal. This can be assessed graphically with a QQplot against a standard normal 
reference distribution or tested formally using one of the countless numerical tests 
of normality. A QQplot (quantile—quantile plot) is a standard visual tool for showing 
the relationship between empirical quantiles of the data and theoretical quantiles of a 
reference distribution, with a lack of linearity showing evidence against the hypoth- 
esized reference distribution. In Figure 3.2 we show a QQplot of daily returns of 
the Disney share price from 1993 to 2000 against a normal reference distribution; 
the inverted “S-shaped” curve of the points suggests that the empirical quantiles of 
the data tend to be larger than the corresponding quantiles of a normal distribution, 
indicating that the normal distribution is a poor model for these returns. 
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Figure 3.2. QQplot of daily returns of the Disney share price from 1993 to 2000 
against a normal reference distribution (see also Example 3.3). 


Particularly useful numerical tests include those of Jarque and Bera, Anderson 
and Darling, Shapiro and Wilk, and D’ Agostino. The Jarque—Bera test belongs to 
the class of omnibus moment tests, i.e. tests which assess simultaneously whether 
the skewness and kurtosis of the data are consistent with a Gaussian model. The 


sample skewness and kurtosis coefficients of a univariate sample Z;,..., Z, are 
defined by 
1 * (Zi -Z 1 n (Zi- Z) 
p= Thai =D? _ MIZ- yoy 
(a/n) X= (Zi — 2)» (d/n) X i= (Zi — 2) 


These are designed to estimate the theoretical skewness and kurtosis, which are 
defined, respectively, by /B = E(Z — u)? /o? and k = E(Z — u)t/o*, where 
u = E(Z) and o? = var(Z) denote mean and variance; ./B and « take the values 
zero and three for a normal variate Z. The Jarque—Bera test statistic is 


T = kn(b + 4(k —3)”) 


and has an asymptotic chi-squared distribution with two degrees of freedom under 
the null hypothesis of normality; sample kurtosis values differing widely from three 
and skewness values differing widely from zero may lead to rejection of normality. 


Multivariate tests. To test for multivariate normality it is not sufficient to test that 
the univariate margins of the distribution are normal. We will see in Chapter 5 that 
it is possible to have multivariate distributions with normal margins that are not 
themselves multivariate normal distributions. Thus we also need to be able to test 
joint normality and a simple way of doing this is to exploit the fact that the quadratic 
form in (3.14) has a chi-squared distribution. Suppose we estimate yz and X using 
the standard estimators in (3.9) and construct the data 


{D? = (X; — XY S! (X; — X):i =1,...,n}. (3.16) 


70 3. Multivariate Models 


Because the estimates of the mean vector and the covariance matrix are used in the 
construction of each D?, these data are not independent, even if the original X; data 
were. Moreover, the marginal distribution of D? under the null hypothesis is not 
exactly chi-squared; we have in fact that n(n — 1)7? D? ~ Beta(4d, x(n —d-1)), 
so that the true distribution is a scaled beta distribution, although it turns out to be very 
close to chi-squared for large n. We expect D?,..., D? to behave roughly like an 
iid sample from a xa distribution and for simplicity we construct QQplots against 
this distribution. (It is also possible to make QQplots against the beta reference 
distribution and these look very similar.) 

Numerical tests of multivariate normality based on multivariate measures of skew- 
ness and kurtosis are also possible. Suppose we define, in analogy to (3.15), 


1 n n 1 n 
ba = ao De ka=) Dĵ, (3.17) 
i=l 


i=l j=1 


where D; is given in (3.16) and is known as the Mahalanobis distance between 
X; and X, and Dj; = (X; — X)S~!(Xj; — X) is known as the Mahalanobis angle 
between X; — X and X j= X . These measures in fact reduce to the univariate mea- 
sures b and k in the case d = 1. Under the null hypothesis of multivariate normality 
the asymptotic distributions of these statistics as n —> œ are 
ką—d(d +2 

A EE ao ~ N(0,1).  — G.18) 
Mardia’s test of multinormality involves comparing the skewness and kurtosis statis- 
tics with the above theoretical reference distributions. Since large values of the 
statistics cast doubt on the multivariate normal model, one-sided tests are generally 
performed. Usually the tests of kurtosis and skewness are performed separately, 
although there are also a number of joint (or so-called omnibus) tests (see Notes and 
Comments). 


Example 3.3 (on the normality of returns on Dow Jones 30 stocks). We applied 
tests of normality to an arbitrary subgroup of 10 of the stocks comprising the Dow 
Jones index (see Table 3.1 for the stock codes and Table 4.1 for names). We took eight 
years of data spanning the period 1993-2000 and formed daily, weekly, monthly 
and quarterly logarithmic returns. For each stock we calculated sample skewness 
and kurtosis and applied the Jarque—Bera test to the univariate time series. The daily 
and weekly return data fail all tests; in particular, it is notable that there are some 
large values for the sample kurtosis. For the monthly data, the null hypothesis of 
normality is not formally rejected (p-value greater than 0.05) for four of the stocks; 
for quarterly data it is not rejected for five of the stocks, although here the sample 
size is small. 

We applied Mardia’s tests of multinormality based on both multivariate skewness 
and kurtosis to the multivariate data for all 10 stocks. The results are shown in 
Table 3.2. We also compared the D? data (3.16) toa Xjo-distribution using a QQplot 
(see Figure 3.3). 
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Table 3.1. Sample skewness (vb) and kurtosis (k) coefficients as well as p-values for 
Jarque—Bera tests of normality for an arbitrary set of 10 of the Dow Jones 30 stocks (see 
Example 3.3 for details). 


Stock Vb k p-value Vb k p-value 
Daily returns, n = 2020 Weekly returns, n = 416 
AXP 0.05 5.09 0.00 —0.01 3.91 0.00 
EK —1.93 31.20 0.00 —1.13 14.40 0.00 
BA —0.34 10.89 0.00 —0.26 7.54 0.00 
C 0.21 5.93 0.00 0.44 5.42 0.00 
KO —0.02 6.36 0.00 —0.21 4.37 0.00 
MSFT —0.22 8.04 0.00 —0.14 5.25 0.00 
HWP —0.23 6.69 0.00 —0.26 4.66 0.00 
INTC —0.56 8.29 0.00 —0.65 5.20 0.00 
JPM 0.14 5.25 0.00 —0.20 4.93 0.00 
DIS —0.01 9.39 0.00 0.08 4.48 0.00 


Monthly returns, n = 96 Quarterly returns, n = 32 
uaaal aa ee 


AXP —1.22 5.99 0.00 —1.04 4.88 0.01 
EK —1.52 10.37 0.00 —0.63 4.49 0.08 
BA —0.50 4.15 0.01 —0.15 6.23 0.00 
C —1.10 7.38 0.00 —1.61 7.13 0.00 
KO —0.49 3.68 0.06 —1.45 5.21 0.00 
MSFT —0.40 3.90 0.06 —0.56 2.90 0.43 
HWP —0.33 3.47 0.27 —0.38 3.64 0.52 
INTC —1.04 6.50 0.00 —0.42 3.10 0.62 
JPM —0.51 5.40 0.00 —0.78 7.26 0.00 
DIS 0.04 3.26 0.87 —0.49 4.32 0.16 


Table 3.2. Mardia’s tests of multivariate normality based on the multivariate measures of 
skewness and kurtosis in (3.17) and the asymptotic distributions in (3.18) (see Example 3.3 
for details). 


Daily Weekly Monthly Quarterly 


n 2020 416 96 32 
bio 9.31 9.91 21.10 50.10 
p-value 0.00 0.00 0.00 0.02 
kio 242.45 177.04 142.65 120.83 
p-value 0.00 0.00 0.00 0.44 


The daily, weekly and monthly return data fail the multivariate tests of normal- 
ity. For quarterly return data the multivariate kurtosis test does not reject the null 
hypothesis, but the skewness test does; the QQplot in Figure 3.3(d) looks slightly 
more linear. Thus there is some evidence that returns over a quarter year are close to 
being normally distributed, which might indicate a central limit theorem effect taking 
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Figure 3.3. QQplot of the D? data in (3.16) against a X distribution for the datasets of 
Example 3.3: (a) daily analysis; (b) weekly analysis; (c) monthly analysis; and (d) quarterly 
analysis. Under the null hypothesis of multivariate normality these should be roughly linear. 


place, although the sample size is too small to reach any more reliable conclusion. 
The evidence against the multivariate normal distribution is certainly overwhelming 
for daily, weekly and monthly data. 


The results in Example 3.3 are fairly typical for financial return data. This suggests 
that in many risk-management applications the multivariate normal distribution is 
not a good description of reality. It has three main defects that we will discuss at 
various points in this book. 


(1) The tails of its univariate marginal distributions are too thin; they do not assign 
enough weight to extreme events. 


(2) The joint tails of the distribution do not assign enough weight to joint extreme 
outcomes. 


(3) The distribution has a strong form of symmetry, known as elliptical symmetry. 


In the next section we look at models that address some of these defects. We con- 
sider normal variance mixture models, which share the elliptical symmetry of the 
multivariate normal, but have the flexibility to address (1) and (2) above; we also 
look at normal mean-variance mixture models, which introduce some asymmetry 
and thus address (3). 
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Notes and Comments 


Much of the material covered briefly in Section 3.1 can be found in greater detail 
in standard texts on multivariate statistical analysis such as Mardia, Kent and Bibby 
(1979), Seber (1984), Giri (1996) or Johnson and Wichern (2002). 

There are countless possible tests of univariate normality and a good starting point 
is the entry on “departures from normality, tests for” in volume 2 of the Encyclopedia 
of Statistics (Kotz, Johnson and Read 1985). For an introduction to QQplots see Rice 
(1995, pp. 353-357); for the widely applied Jarque—Bera test based on the sample 
skewness and kurtosis, see Jarque and Bera (1987). 

The true distribution of D? = (X; — X)S~!(X; — X) for iid Gaussian data was 
shown by Gnanadesikan and Kettenring (1972) to be a scaled beta distribution 
(see also Gnanadesikan 1997). The implications of this fact for the construction of 
QQplots in small samples are considered by Small (1978). References for multi- 
variate measures of skewness and kurtosis and Mardia’s test of multinormality are 
Mardia (1970, 1974, 1975). See also Mardia, Kent and Bibby (1979), the entry on 
“multivariate normality, testing for” in volume 6 of the Encyclopedia of Statistics 
(Kotz, Johnson and Read 1985), and the entry on “Mardia’s test of multinormality” 
in volume 5 of the same publication. 


3.2 Normal Mixture Distributions 


In this section we generalize the multivariate normal to obtain multivariate normal 
mixture distributions. The crucial idea is the introduction of randomness into first 
the covariance matrix and then the mean vector of a multivariate normal distribution 
via a positive mixing variable, which will be known throughout as W. 


3.2.1 Normal Variance Mixtures 


Definition 3.4. The random vector X is said to have a (multivariate) normal variance 
mixture distribution if 
X Ż p +vWAZ, (3.19) 


where 
© Z~ Nx(O, Ik); 
(ii) W > 0 is a non-negative, scalar-valued rv which is independent of Z, and 


(iii) A € R*¥ and u € Rf are a matrix and vector of constants, respectively. 


Such distributions are known as variance mixtures, since if we condition on the rv 
W we observe that X | W = w ~ Na( u, wX), where X = AA’. The distribution 
of X can be thought of as a composite distribution constructed by taking a set of 
multivariate normal distributions with the same mean vector and with the same 
covariance matrix up to a multiplicative constant w. The mixture distribution is 
constructed by drawing randomly from this set of component multivariate normals 
according to a set of “weights” determined by the distribution of W; the resulting 
mixture is not itself a multivariate normal distribution. In the context of modelling 
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risk-factor returns, the mixing variable W could be interpreted as a shock that arises 
from new information and impacts the volatilities of all stocks. 

As for the multivariate normal, we are most interested in the case where rank(A) = 
d < k and X isa full-rank, positive-definite matrix; this will give us a non-singular 
normal variance mixture. 

Provided that W has a finite expectation, we may easily calculate that 


E(X) = E(u+ J/WAZ) = w+ E(VW)AE(Z) = 
and that 
cov(X) = E((/WAZ)(V/WAZ)) = E(W)AE(ZZ')A’ = E(W)E. (3.20) 


We refer to u and X in general as the location vector and the dispersion matrix of 
the distribution. Note that X (the covariance matrix of AZ) is only the covariance 
matrix of X if E(W) = 1 and that m is only the mean vector when E (X) is defined, 
which requires E (W'/2) < oo. The correlation matrices of X and Z are the same 
when E(W) < oo. Note also that these distributions provide good examples of 
models where a lack of correlation does not necessarily imply independence of the 
components of X; indeed we have the following simple result. 


Lemma 3.5. Let (X,, X2) have a normal mixture distribution with A = h and 
E(W) < œ so that cov(X 1, X2) = 0. Then X and X2 are independent if and only 
if W is almost surely constant, i.e. (X1, X2) are normally distributed. 


Proof. We calculate that 
E(\Xi||X2l) = E(W|Z1||Zo|) = E(W)E (Z1 |) E(Za2l) 
> (EWVW)P EZ) EUZ2)) = EU X1/) EU X21), 


with equality throughout only when W is a constant. 


Using (3.10), we can calculate that the characteristic function of a normal variance 
mixture is given by 
ox (t) = E(E(exp(it’X) | W)) = E (expt — 5Wt' St) 
= exp(it’w) H(A Xt). (3.21) 
where H (0) = ie e™®” dH (v) is the Laplace-Stieltjes transform of the df H of 
W. Based on (3.21) we use the notation X ~ Ma( m, X, H) for normal variance 
mixtures. 
Assuming that X is positive definite and that the distribution of W has no point 
mass at zero, we may derive the joint density of a normal variance mixture distribu- 


tion. Writing fx|w for the (Gaussian) conditional density of X given W, the density 
of X is given by 


Ff (x) = | fwa | w) dH (w) 


B wea? m (x — a'I (x — p) 
=] mrap OP 2w 


| dH (w), (3.22) 
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in terms of the Lebesgue-Stieltjes integral; when H has density h we simply mean 
the Riemann integral ie fxıw(x | w)h(w) dw. All such densities will depend on x 
only through the quadratic form (x — yt)’ XT! (x — m) and this means they are the 
densities of elliptical distributions, as will be discussed in Section 3.3. 


Example 3.6 (multivariate two-point normal mixture distribution). Simple 
examples of normal mixtures are obtained when W is a discrete rv. For exam- 
ple, the two-point normal mixture model is obtained by taking W in (3.19) to be a 
discrete rv which assumes the distinct positive values kı and k2 with probabilities p 
and | — p, respectively. By setting k2 large relative to kı and choosing p large, this 
distribution might be used to define two regimes: an ordinary regime that holds most 
of the time and a stress regime that occurs with small probability 1 — p. Obviously 
this idea extends to k-point mixture models. 


Example 3.7 (multivariate ¢ distribution). If we take W in (3.19) to be an rv with 
an inverse gamma distribution W ~ Ig(v, $v) (which is equivalent to saying that 
v/W ~ x2), then X has a multivariate ¢ distribution with v degrees of freedom 
(see Section A.2.6 for more details concerning the inverse gamma distribution). Our 
notation for the multivariate t is X ~ tga(v, p, X) and we note that X is not the 
covariance matrix of X in this definition of the multivariate t. Since E(W) = v/(v— 
2) we have cov(X) = (v/(v — 2)) and the covariance matrix (and correlation 
matrix) of this distribution are only defined if v > 2. 
Using (3.22), the density can be calculated to be 


ro +d)) ( (x — py Sx — w) ke 
Pr Gvarv)? 21/2 . 


f(x) = (3.23) 


v 
Clearly, the locus of points with equal density is again an ellipsoid with equation 
(x — uy =I (x — m) = c, for some c > 0. A bivariate example with four degrees 
of freedom is given in Figure 3.1. In comparison with the multivariate normal the 
contours of equal density rise more quickly in the centre of the distribution and decay 
more gradually on the “lower slopes” of the distribution. We will see later that, in 
comparison with the multivariate normal, the multivariate t has heavier marginal 
tails (Chapter 7) and a more pronounced tendency to generate simultaneous extreme 
values (Section 5.3.1). 


Example 3.8 (symmetric generalized hyperbolic distribution). A flexible family 
of normal variance mixtures is obtained by taking W in (3.19) to have a generalized 
inverse Gaussian (GIG) distribution, W ~ N7 (A, x, Y) (see Section A.2.5). Using 
(3.22), it can be shown that a normal variance mixture constructed with this mixing 
density has the joint density 


WADY Ky apy V(X + œ — BY Ee — WW) 


(2m) ZK SxH) (VOF œ= BYE I — Bp) ’ 
(3.24) 

where K, denotes a modified Bessel function of the third kind (see Section A.2.5 

for more details). This distribution is a special case of the more general family of 


fx) = 
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multivariate generalized hyperbolic distributions, which we will discuss in greater 
detail in Section 3.2.2. The more general family can be obtained as mean-variance 
mixtures of normals, which are not necessarily elliptical distributions. 

The GIG mixing distribution is very flexible and contains the gamma and inverse 
gamma distributions as special boundary cases (corresponding, respectively, to à > 
0, x = O and to à < 0, Ww = 0). In these cases the density in (3.24) should be 
interpreted as a limit as x —> 0 or as y — 0. (Information on the limits of Bessel 
functions is found in Section A.2.5.) The gamma mixing distribution yields Laplace 
distributions or so-called symmetric variance-gamma models and the inverse gamma 
yields the ¢ as in Example 3.7; to be precise the t corresponds to the case when A = 
—v/2 and x = v. The special cases à = —0.5 and A = 1 have also had attention in 
financial modelling. The former gives rise to the symmetric normal inverse Gaussian 
(NIG) distribution; the latter gives rise to a symmetric multivariate distribution 
whose one-dimensional margins are known simply as hyperbolic distributions. 

To calculate the covariance matrix of distributions in the symmetric generalized 
hyperbolic family, we require the mean of the GIG distribution, which is given 
in (A.9) for the case x > 0 and y > 0. The covariance matrix of the multivariate 
distribution in (3.24) follows from (3.20). 


Normal variance mixture distributions are easy to work with under linear opera- 
tions, as shown in the following simple proposition. 


Proposition 3.9. If X ~ Ma(u, X, H) and Y = BX +b, where B € R**4 and 
b € RÝ, then Y ~ My (Bu+b, BIB’, P). 


Proof. The characteristic function in (3.21) may be used to show that 


py (t) = E(t BXD) — eit bhy (Bt) = el Bub ALY BE B't). 


Thus the subclass of mixture distributions specified by Ê is closed under linear 
transformations. For example, if X has a multivariate ¢ distribution with v degrees 
of freedom, then so does any linear transformation of X; the linear combination a’ X 
would have a univariate ¢ distribution with v degrees of freedom (more precisely, 
the distribution a’X ~ tı (v, a'u, a’ Xa)). 

Normal variance mixture distributions (and the mean-variance mixtures consid- 
ered later in Section 3.2.2) are easily simulated, the method being obvious from 
Definition 3.4. To generate a variate X ~ Ma(p, X, H ) with X positive definite we 
use the following algorithm. 


Algorithm 3.10 (simulation of normal variance mixtures). 


(1) Generate Z ~ Nq(0, X) using Algorithm 3.2. 


(2) Generate independently a positive mixing variable W with df H (correspond- 
ing to the Laplace-Stieltjes transform H). 


(3) Set X = p +y WZ. 
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To generate X ~ tą (v, p, X), the mixing variable W should have an Ig(5¥, $v) dis- 
tribution; it is helpful to note that in this case v/ W ~ x, a chi-squared distribution 
on v degrees of freedom. Sampling from a generalized hyperbolic distribution with 
density (3.24) requires us to generate W ~ N (A, x, Y). Sampling from the GIG 
distribution can be accomplished using a rejection algorithm proposed by Atkinson 
(1982). 


3.2.2 Normal Mean-Variance Mixtures 


All of the multivariate distributions we have considered so far have elliptical sym- 
metry and this may well be an oversimplified model for real risk-factor return data. 
Among other things, elliptical symmetry implies that all one-dimensional marginal 
distributions are rigidly symmetric, which contradicts the frequent observation for 
stock returns that negative returns (losses) have heavier tails than positive returns 
(gains). The models we now introduce attempt to add some asymmetry to the class 
of normal mixtures by mixing normal distributions with different means as well 
as different variances; this yields the class of multivariate normal mean-variance 
mixtures. 


Definition 3.11. The random vector X is said to have a (multivariate) normal mean- 
variance mixture distribution if 


X Ê mw) + WAZ, (3.25) 
where 
(i) Z~ Nx(O, Ik); 
(ii) W > 0 is a non-negative, scalar-valued rv which is independent of Z; 
(iii) A € R?*® is a matrix; and 
(iv) m : [0, 00) —> R? is a measurable function. 


In this case we have that 
X | W = w~ Na(m(w), wX), (3.26) 


where X = AA’ and it is clear why such distributions are known as mean-variance 
mixtures of normals. In general, such distributions are not elliptical. 
A possible concrete specification for the function m(W) in (3.26) is 


m(W)=u+Wy, (3.27) 


where m and y are parameter vectors in R¢. Since E(X | W) = w+ Wy and 
cov(X | W) = W X, it follows in this case by simple calculations that 


E(X) = E(E(X | W)) = n + E(W)y, (3.28) 


cov(X) = E(cov(X | W)) + cov(E(X | W)) 
= E(W)Z + var(W)yy’, (3.29) 
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when the mixing variable W has finite variance. We observe from (3.28) and (3.29) 
that the parameters u and X are not, in general, the mean vector and covariance 
matrix of X (or a multiple thereof). This is only the case when y = 9, so that the 
distribution is a normal variance mixture and the simpler moment formulas given 
in (3.20) apply. 


3.2.3 Generalized Hyperbolic Distributions 


In Example 3.8 we looked at the special subclass of the generalized hyperbolic dis- 
tributions consisting of the elliptically symmetric normal variance mixture distribu- 
tions. The full generalized hyperbolic family is obtained using the mean-variance 
mixture construction (3.25) and the conditional mean specification (3.27). For the 
mixing distribution we assume that W ~ N7 (A, x, WY), a GIG distribution with 
density (A.8). 


Remark 3.12. This class of distributions has received a lot of attention in the 
financial-modelling literature, particularly in the univariate case. An important rea- 
son for this attention is their link to Lévy processes, i.e. processes with independent 
and stationary increments (like Brownian motion) that are used to model price pro- 
cesses in continuous time. For every generalized hyperbolic distribution it is possible 
to construct a Lévy process so that the value of the increment of the process over 
a fixed time interval has that distribution; this is only possible because the general- 
ized hyperbolic law is a so-called infinitely divisible distribution, a property that it 
inherits from the GIG mixing distribution of W. 


The joint density in the non-singular case (X has rank d) is 
co g(x my Ely 
fœ) = [ Ory 52 wa? 


œ- ET x- n) y'a !y 
2w 2/w 


x ep haw dw, 


where h(w) is the density of W. Evaluation of this integral gives the generalized 
hyperbolic density 
15-1 
Kazam (V (X + @ -W'S - ww yEy) eO y” 
V(X + Œ = a E = YG + yEy 


fœ)=c 


(3.30) 
where the normalizing constant is 


o UD WF a pea 
(2r)1/ EKV xY) l 
Clearly, if y = 0, the distribution reduces to the symmetric generalized hyperbolic 
special case of Example 3.8. In general we have a non-elliptical distribution with 
asymmetric margins. The mean vector and covariance matrix of the distribution 
are easily calculated from (3.28) and (3.29) using the information on the GIG and 
its moments given in Section A.2.5. The characteristic function of the generalized 


3.2. Normal Mixture Distributions 719 


hyperbolic distribution may be calculated using the same approach as in (3.21) to 
yield 
ox(t) = Ee" *) = e“ AGr St — it'y), (3.31) 


where Ê is the Laplace-—Stieltjes transform of the GIG distribution. 

We adopt the notation X ~ GHyg(Q, x, wv, m, X,y). Note that the distribu- 
tions GHg(A, x/k, kw, mw, kX, ky) and GHyQ, x, Y, M, X, y) are identical for 
any k > 0, which causes an identifiability problem when we attempt to estimate the 
parameters in practice. This can be solved by constraining the determinant |X | to 
be a particular value (such as one) when fitting. Note that, while such a constraint 
will have an effect on the values of x and y that we estimate, it will not have an 
effect on the value of x y, so this product is a useful summary parameter for the GH 
distribution. 


Linear combinations. The generalized hyperbolic class is closed under linear oper- 
ations. 


Proposition 3.13. If X ~ GHg(A, x, Y, u, ©, y) and Y = BX + b, where B € 
R**4 and b € RÝ, then Y ~ GHy(A, x, Y, Bu + b, BEB’, By). 


Proof. We calculate, using (3.31) and a similar method to Proposition 3.9, that 


gy (t) = e" BH+) (Lt BE B't — it’ By). 


Thus the parameters inherited from the GIG mixing distribution remain un- 
changed under linear operations. This means, for example, that margins of X are 
easy to calculate; we have that X; ~ GHi(, x, Y, Mi, Xii, yi). It also means that 
it would be relatively easy to base a version of the variance-covariance method on 
a generalized hyperbolic model for risk factors. 


Parametrizations. There is a bewildering array of alternative parametrizations for 
the generalized hyperbolic distribution in the literature and it is more common 
to meet this distribution in a reparametrized form. In one common version the 
dispersion matrix we call X is renamed A and the constraint is imposed that |A| = 1; 
this addresses the identifiability problem mentioned above. The skewness parameters 
y are replaced by parameters B and the non-negative parameters xy and w are 
replaced by the non-negative parameters ô and œ according to 


B=Aty, ô=% a=yetya-ly. 


These parameters must satisfy the constraints ô > 0, a? > B’AB if à > 0; 5 > 
0, a? > BAB if A = 0; and ô > 0,a* > BAB if A < 0. Blesild (1981) 
uses this parametrization to show that generalized hyperbolic distributions form a 
closed class of distributions under linear operations and conditioning. However, the 
parametrization does have the problem that the important parameters œ and 6 are 
not generally invariant under either of these operations. 
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It is useful to be able to move easily between our x-w—2—y parametrization, 
as in (3.30), and the a—d—A-f parametrization; A and u are common to both 
parametrizations. If the y-y—'—y parametrization is used, then the formulas for 
obtaining the other parametrization are 


A=|Z) 4d, p= zy, 


= xE, a= 15i + yy). 


If the a-6—A- form is used, then we can obtain our parametrization by setting 


X=, y=Ap, x=8, y=@-— BAP). 


Special cases. The multivariate generalized hyperbolic family is extremely flexible 
and, as we have mentioned, contains many special cases known by alternative names. 


Ifà = 5(d + 1) we drop the word “generalized” and refer to the distribution 
as a d-dimensional hyperbolic distribution. Note that the univariate margins 
of this distribution also have à = 5(d + 1) and are not one-dimensional 
hyperbolic distributions. 


If A = 1 we get a multivariate distribution whose univariate margins are 
one-dimensional hyperbolic distributions. The one-dimensional hyperbolic 
distribution has been widely used in univariate analyses of financial return 
data (see Notes and Comments). 


IfvA = -4 then the distribution is known as an NIG distribution. In the uni- 
variate case, this model has also been used in analyses of return data; its 
functional form is similar to the hyperbolic with a slightly heavier tail. (Note 
that the NIG and the GIG are different distributions!) 


Ifà > Oand x = 0 we get a limiting case of the distribution known variously 
as a generalized Laplace, Bessel function or variance-gamma distribution. 


If à= iy, x = v and w = 0 we get another limiting case which seems 
to have been less well studied, but which could be called an asymmetric or 
skewed ¢ distribution. Evaluating the limit of (3.30) as w — 0 yields the 
multivariate density 


Kooy 0 + O(x))y/Z—!y) exp((x — w)/Z7'y) 
VO FOODY ETEY F921 + (Ox) /v)OFO/2 


f(x) =c (3.32) 


where Q(x) = (x — py X lay — jų) and the normalizing constant is 
Q1-w+d)/2 


c= . 
T (3v) rv) 


This density reduces to the standard multivariate t density in (3.23) as y > 0. 
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3.2.4 Fitting Generalized Hyperbolic Distributions to Data 


While univariate generalized hyperbolic models have been fitted to return data in 
many empirical studies, there has been relatively little applied work with the multi- 
variate distributions. However, normal mixture distributions of the kind we have 
described may be fitted with algorithms of the EM (expectation—maximization) 
type. In this section we present an algorithm for that purpose and sketch the ideas 
behind it. Similar methods have been developed independently by other authors and 
references may be found in Notes and Comments. Readers who are not particularly 
interested in getting an idea of how the estimation works may skip this section, while 
noting the existence of Algorithm 3.14. 

Assume we have iid data X;,..., X, and wish to fit the multivariate gener- 
alized hyperbolic, or one of its special cases. Summarizing the parameters by 
0 = (à, X, Y, H, X, y)’, the problem is to maximize 


n 
ln L(0; X1, ..., Xn) = Xoin fx(Xi; 9), (3.33) 
i=l 
where fy(x; 0) denotes the generalized hyperbolic density in (3.30). 

This problem is not particularly easy at first sight due to the number of parameters 
and the necessity of maximizing over covariance matrices X. However, if we were 
able to “observe” the latent mixing variables W1, ..., W, coming from the mixture 
representation in (3.25), it would be much easier. Since the joint density of any pair 
X; and W; is given by 


fx wœ, w; 0) = fxiw | w; u, X, y)hw(w; à, x, Y), (3.34) 


we could construct the likelihood 


In L(0; X1,..., Xn, Wi,---, Wn) 


n n 
= Jin fyw (X; | Wi; p, Ey) + D> Inhw(Wi; a, x, Y), (3.35) 
i=l i=1 
where the two terms could be maximized separately with respect to the parameters 
they involve. The apparently more problematic parameters of X and y are in the 
first term of the likelihood and estimates are relatively easy to derive due to the 
Gaussian form of this term. 

To overcome the latency of the W; data the EM algorithm is used. This is an 
iterative procedure consisting of an E-step, or expectation step (where essentially W; 
is replaced by an estimate given the observed data and current parameter estimates), 
and an M-step, or maximization step (where the parameter estimates are updated). 
Suppose at the beginning of step k we have parameter estimates 6!"!, We proceed 
as follows. 


E-step. We calculate the conditional expectation of the so-called augmented like- 
lihood (3.35) given the data X;,..., Xn using the parameter values oll, This 
results in the objective function 


0(0; 61) = En L0; X1,...,Xn, Wi.---, Wn) | X1,---, Xn; OM), 
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M-step. We maximize the objective function with respect to 0 to obtain the next 
set of estimates 9! +1), 


Alternating between these steps, the EM algorithm produces improved parameter 
estimates at each step (in the sense that the value of the original likelihood (3.33) is 
continually increased) and we converge to the maximum likelihood (ML) estimates. 

In practice, performing the E-step amounts to replacing any functions g(W;) of the 
latent mixing variables which arise in (3.35) by the quantities E(g(W;) | X;; ol). 
To calculate these quantities we can observe that the conditional density of W; given 
X; satisfies fwjx(w | x; 0) « fw, x(w, x; 0), up to some constant of proportional- 
ity. Thus it may be deduced from (3.34) that 


Wi | X; ~ NT(A — 4d, (Xi-w/ 2 (Xi -wt+x,.v+y'Z'y). (3.36) 


If we write out the likelihood (3.35) using (3.26) for the first term and the GIG 
density (A.8) for the second term, we find that the functions g(W;) arising in (3.35) 
are gı (w) = w, g2(w) = 1/w and g3(w) = In(w). The conditional expectation of 
these functions in model (3.36) may be evaluated using information about the GIG 
distribution in Section A.2.5; note that E(In(W;) | X;; 6!) involves derivatives of 
a Bessel function with respect to order and must be approximated numerically. We 
will introduce the notation 


al = EW | Xn oD, nf? = ew | X30, fl = Edn(w;) | Xi; 00), 
(3.37) 
which allows us to describe the basic EM scheme as well as a variant below. 

In the M-step there are two terms to maximize, coming from the two terms 
in (3.35); we write these as Q1 (u, X, y; glkl) and Q0(,, x, W; 6'*1). To address the 
identifiability issue mentioned in Section 3.2.3 we constrain the determinant of X 
to be some fixed value (in practice we take the determinant of the sample covariance 
matrix S) in the maximization of Q;. The maximizing values of w, X and y may 
then be derived analytically by calculating partial derivatives and setting these equal 
to zero; the resulting formulas are embedded in Algorithm 3.14 below (see steps (3) 
and (4)). The maximization of Q2(A, x, W; 61) with respect to the parameters of 
the mixing distribution is performed numerically; the function Q2(A, x, Y; gl) is 


n n n 
a-d- E E 
i=l i=l i=1 


— tnd ln(x) + inà In) —nIn2Ky(/xW)). (8-38) 


This would complete one iteration of a standard EM algorithm. However, there are 
a couple of variants on the basic scheme; both involve modification of the final step 
described above, namely the maximization of Q2. 
Assuming the parameters u, X and y have been updated first in iteration k, we 
define 
gk 2] — Atal xl, yt, ult, ge, yH, 
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recalculate the weights l [k2] „nl E2 and § in (3.37), and then maximize the 
function Qo(A, n, £; glk. ans in G. 38). This results in a so-called MCECM algorithm 
(multi-cycle, expectation, conditional maximization), which is the one we present 
below. 


[k,2] 


Alternatively, instead of maximizing Q2 we may maximize the original likeli- 
hood (3.33) with respect to A, x and y with the other parameters held fixed at the 
values wl, SI) and y!1; this results in an ECME algorithm. 


Algorithm 3.14 (EM estimation of generalized hyperbolic distribution). 


(1) Set iteration count k = 1 and select starting values for @!"!. In particular, 
reasonable starting values for u, y and X, respectively, are the sample mean, 
the zero vector and the sample covariance matrix S. 


(2) Calculate weights ô! and n! using (3.37), (3.36) and (A.9). Average the 
weights to get 


n n 
k —1 [k] -[k —1 [k] 
lin >al and Akl =n Xon! ; 
i=l i=l 
(3) For a symmetric model set y+! = 0. Otherwise set 


= [ae 
tk+1] ” pee 1 až — Xi) 
~ sll pled — 1 i 


y 


(4) Update estimates of the location vector and dispersion matrix by 


— k 
n~! ae l lx; — y+] 


[k+1] _ 
H = 5i] ’ 
1 n i 
y= = 5al (X; _ ut! ox; _ gh = ql y Ny, Ur 
pH S| y 
jw |i/d i 
(5) Set 


gE — alk l x, yl, ple etl, yl dy 


[k21 pE? 
ni 


Calculate weights ô; and él [k.2] using (3.37), (3.36) and information 


in Section A.2.5. 


(6) Maximize Q2(å, x, W; 6!*.21) in (3.38) with respect to À, x and y to complete 
the calculation of 6!*-7!, Increment iteration count k —> k+ 1 and go to step (2). 


This algorithm may be easily adapted to fit special cases of the generalized hyper- 
bolic distribution. This involves holding certain parameters fixed throughout and 
maximizing with respect to the remaining Pear for the hyperbolic distribu- 
tion we set A = 1; for the NIG distribution à = — 5; for the f distribution y = 0; for 
the VG distribution x = 0. In the case of t and VGi in step (6) we have to work with 
the function Q2 that results from assuming an inverse gamma or gamma density 
for hy. 
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3.2.5 Empirical Examples 


In this section we fit the multivariate generalized hyperbolic (GH) distribution to real 
data and examine which of the subclasses—such as t, hyperbolic or NIG—are most 
useful; we also explore whether the general mean-variance mixture models can be 
replaced by (elliptically symmetric) variance mixtures. Our first example prepares 
the ground for multivariate examples by looking briefly at univariate models. 


Example 3.15 (univariate stock returns). In the literature the NIG, hyperbolic 
and t models have been particularly popular special cases. We fit symmetric and 
asymmetric cases of these distributions to the data used in Example 3.3, restricting 
attention to daily and weekly returns, where the data are more plentiful (n = 2020 
and n = 468, respectively). Models are fitted using maximum likelihood under 
the simplifying assumption that returns form iid samples; a simple quasi-Newton 
method provides a viable alternative to the EM algorithm in the univariate case. 

In the upper two panels of Table 3.3 we show results for symmetric models. The 
t, NIG and hyperbolic models may be compared directly using the log-likelihood 
at the maximum, since all have the same number of parameters: for daily data we 
find that eight out of 10 stocks prefer the t distribution to the hyperbolic and NIG 
distributions; for weekly returns the ¢ distribution is favoured in six out of 10 cases. 
Overall, the second best model appears to be the NIG distribution. The mixture 
models fit much better than the Gaussian model in all cases, and it may be verified 
easily using the Akaike information criterion (AIC) that they are preferred to the 
Gaussian model in a formal comparison (see Section A.3.6 for more on the AIC). 

For the asymmetric models, we only show cases where at least one of the asym- 
metric t, NIG or hyperbolic models offered a significant improvement (p < 0.05) 
on the corresponding symmetric model according to a likelihood ratio test. This 
occurred for weekly returns on Citigroup (C) and Intel (INTC) but for no daily 
returns. For Citigroup the p-values of the tests were, respectively, 0.06, 0.04 and 
0.04 for the t, NIG and hyperbolic cases; for Intel the p-values were 0.01 in all 
cases, indicating quite strong asymmetry. 

In the case of Intel we have superimposed the densities of various fitted asymmet- 
ric distributions on a histogram of the data in Figure 3.4. A plot of the log densities 
shown alongside reveals the differences between the distributions in the tail area. 
The left tail (corresponding to losses) appears to be heavier for these data and the 
best-fitting distribution according to the likelihood comparison is the asymmetric 
t distribution. 


Example 3.16 (multivariate stock returns). We fitted multivariate models to the 
full 10-dimensional dataset of log-returns used in the previous example. The result- 
ing values of the maximized log-likelihood are shown in Table 3.4 along with p- 
values for a likelihood ratio test of all special cases against the (asymmetric) general- 
ized hyperbolic (GH) model. The number of parameters in each model is also given; 
note that the general d-dimensional GH model has 5d(d + 1) dispersion parame- 
ters, d location parameters, d skewness parameters and three parameters coming 
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Table 3.3. Comparison of univariate models in the generalized hyperbolic family, show- 
ing estimates of selected parameters and the value of the log-likelihood at the maximum; 
bold numbers indicate the models that give the largest values of the log-likelihood. See 
Example 3.15 for commentary. 


Gauss t model NIG model Hyperbolic model 
e—a e—a e—a 


Stock lnL v lnL Vxw lnL Vxw lnL 


Daily returns: symmetric models 
AXP 4945.7 5.8 5001.8 1.6 5002.4 1.3 5002.1 
EK 5112.9 3.8 5396.2 0.8 5382.5 0.6 5366.0 
BA 5054.9 3.8 5233.5 0.8 5229.1 0.5 5221.2 
C 4746.6 6.3 4809.5 1.9 4806.8 1.7 4805.0 
KO 5319.6 5.1 5411.0 1.4 5407.3 1.3 5403.3 
MSFT 4724.3 5.8 4814.6 1.6 4809.5 1.5 4806.4 
HWP 4480.1 4.5 4588.8 1.1 4587.2 0.9 4583.4 
INTC 4392.3 5.4 4492.2 1.5 4486.7 1.4 4482.4 
JPM 4898.3 5.1 4967.8 1.3 4969.5 0.9 4969.7 
DIS 5047.2 4.4 5188.3 1 5183.8 0.8 5177.6 


Weekly returns: symmetric models 


AXP 719.9 8.8 724.2 3.0 724.3 2.8 724.3 
EK 718.7 3.6 765.6 0.7 764.0 0.5 761.3 
BA 7324 44 759.2 1.0 758.3 0.8 757.2 
C 656.0 5.7 669.6 1.6 669.3 1.3 669 

KO 757.1 6.0 765.7 1.7 766.2 1.3 766.3 
MSFT 671.5 6.3 683.9 1.9 683.2 1.8 682.9 
HWP 627.1 6.0 637.3 1.8 637.3 1.5 637.1 
INTC 595.8 5.2 611.0 1.5 610.6 1.3 610 

JPM 681.7 5.9 693.0 1.7 692.9 1.5 692.6 
DIS 734.1 64 742.7 1.9 742.8 1.7 742.7 


Weekly returns: asymmetric models 


C NA 6.1 671.4 1.7 671.3 1.3 671.2 
INTC NA 6.3 614.2 1.8 613.9 1.7 613.3 


from the GIG mixing distribution, but is subject to one identifiability constraint; this 
gives 5(d (d +5) + 4) free parameters. 

For the daily data the best of the special cases is the skewed ż distribution, which 
gives a value for the maximized likelihood that cannot be discernibly improved 
by the more general model with its additional parameter. All other non-elliptically 
symmetric submodels are rejected in a likelihood ratio test. Note, however, that the 
elliptically symmetric t distribution cannot be rejected when compared with the 
most general model, so that this seems to offer a simple parsimonious model for 
these data (the estimated degree of freedom is 6.0). 

For the weekly data the best special case is the NIG distribution, followed closely 
by the skewed f; the hyperbolic and variance gamma are rejected. The best ellipti- 
cally symmetric special case seems to be the ¢ distribution (the estimated degree of 
freedom being, this time, 6.2). 
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Figure 3.4. Models for weekly returns on Intel (INTC). 


Example 3.17 (multivariate exchange-rate returns). We fitted the same multi- 
variate models to a four-dimensional dataset of exchange-rate log-returns, these 
being GB pound, euro, Japanese yen and Swiss franc against the US dollar for 
the period January 2000 to the end of March 2004 (1067 daily returns and 222 
weekly returns). The resulting values of the maximized log-likelihood are shown in 
Table 3.5. 

For the daily data the best of the special cases (in general and also if we restrict 
ourselves to symmetric models) is the NIG distribution, followed by the hyperbolic, 
t and variance-gamma (VG) distributions in that order. In a likelihood ratio test of 
the special cases against the general GH distribution only the VG model is rejected 
at the 5% level; the skewed t model is rejected at the 10% level. When tested against 
the full model, certain elliptical models could not be rejected, the best of these being 
the NIG. 

For the weekly data the best special case is the ¢ distribution, followed by the 
NIG, hyperbolic and variance gamma; none of the special cases can be rejected in a 
test at the 5% level although the VG model is rejected at the 10% level. Among the 
elliptically symmetric distributions the Gauss distribution is clearly rejected, and 
the VG is again rejected at the 10% level, but otherwise the elliptical special cases 
are accepted; the best of these seems to be the ¢ distribution, which has an estimated 
degrees-of-freedom parameter of 5.99. 


3.2. Normal Mixture Distributions 87 


Table 3.4. A comparison of models in the GH family for 10-dimensional stock-return data. 
For each model, the table shows the value of the log-likelihood at the maximum (In L), the 
numbers of parameters (# par.) and the p-value for a likelihood ratio test against the general 
GH model. The log-likelihood values for the general model, the best special case and the best 
elliptically symmetric special case are in bold type. See Example 3.16 for details. 


GH NIG Hyperbolic t VG Gauss 
Daily returns: asymmetric models 
InL 52174.62 52141.45 52111.65 52174.62 52063.44 
# par. 71 76 76 76 76 
p-value 0.00 0.00 1.00 0.00 
Daily returns: symmetric models 
lnL 52170.14 52136.55 52106.34 52170.14 52057.38 50805.28 
# par. 67 66 66 66 66 65 
p-value 0.54 0.00 0.00 0.63 0.00 0.00 
Weekly returns: asymmetric models 
lnL 7 639.32 7 638.59 7636.49 7638.56 7631.33 
p-value 0.23 0.02 0.22 0.00 
Weekly returns: symmetric models 
lnL 7633.65 7 632.68 7 630.44 7633.11 7625.4 7433.77 
p-value 0.33 0.27 0.09 0.33 0.00 0.00 


Table 3.5. A comparison of models in the GH family for four-dimensional exchange-rate 
return data. For each model, the table shows the value of the log-likelihood at the maximum 
(ln L), the numbers of parameters (# par.) and the p-value for a likelihood ratio test against 
the general GH model. The log-likelihood values for the general model, the best special case 
and the best elliptically symmetric special case are in bold type. See Example 3.17 for details. 


GH NIG Hyperbolic t VG Gauss 
Daily returns: asymmetric models 
lnL 17306.44 17306.43 17305.61 17304.97 17302.5 
# par. 20 19 19 19 19 
p-value 0.85 0.20 0.09 0.00 
Daily returns: symmetric models 
lnL 17303.10 17303.06 17302.15 17301.85 17299.15 17144.38 
# par. 16 15 15 15 15 14 
p-value 0.15 0.24 0.13 0.10 0.01 0.00 
Weekly returns: asymmetric models 
lnL 2 890.65 2 889.90 2 889.65 2 890.65 2 888.98 
p-value 0.22 0.16 1.00 0.07 


Weekly returns: symmetric models 


lnL 2 887.52 2 886.74 2 886.48 2 887.52 2 885.86 2 872.36 
p-value 0.18 0.17 0.14 0.28 0.09 0.00 
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Notes and Comments 


Important early papers on multivariate normal mixtures are Kelker (1970) and Cam- 
banis, Huang and Simons (1981). See also Bingham and Kiesel (2002), which 
contains an overview of the connections between normal mixture, elliptical and 
hyperbolic models, and discusses their role in financial modelling. Fang, Kotz and 
Ng (1987) discuss the symmetric normal mixture models as special cases in their 
account of the more general family of spherical and elliptical distributions. 

The generalized hyperbolic distributions (univariate and multivariate) were intro- 
duced in Barndorff-Nielsen (1978) and further explored in Barndorff-Nielsen and 
Blesild (1981). Useful references on the multivariate distribution are Blesild (1981) 
and Blesild and Jensen (1981). Generalized hyperbolic distributions (particularly 
in the univariate case) have been popularized as models for financial returns in 
recent papers by Eberlein and Keller (1995) and Eberlein, Keller and Prause (1998) 
(see also Bibby and Sørensen 2003). The PhD thesis of Prause (1999) is also a 
compendium of useful information in this context. 

The reasons for their popularity in financial applications are both empirical and 
theoretical: they appear to provide a good fit to financial return data (again mostly in 
univariate investigations); they are consistent with continuous-time models, where 
logarithmic asset prices follow univariate or multivariate Lévy processes (thus 
generalizing the Black-Scholes model, where logarithmic prices follow Brownian 
motion) (see Eberlein and Keller 1995). 

For the NIG special case see Barndorff-Nielsen (1997), who discusses both uni- 
variate and multivariate cases and argues that the NIG is slightly superior to the 
hyperbolic as a univariate model for return data, a claim that our analyses support 
for stock-return data. Kotz, Kozubowski and Podgórski (2001) is a useful reference 
for the variance-gamma special case; the distribution appears here under the name 
generalized Laplace distribution and a (univariate or multivariate) Lévy process with 
variance-gamma-distributed increments is called a Laplace motion. The univariate 
Laplace motion is essentially the model proposed by Madan and Seneta (1990), 
who derived it as a Brownian motion under a stochastic time change and referred 
to it as the variance-gamma model (see also Madan, Carr and Chang 1998). The 
multivariate ¢ distribution is discussed in Kotz and Nadarajah (2004); the asymmet- 
ric or skewed ¢ distribution presented in this chapter is also discussed in Bibby and 
Sgrensen (2003). For alternative skewed extensions of the multivariate t, see Kotz 
and Nadarajah (2004) and Genton (2004). 

EM algorithms for the multivariate generalized hyperbolic distribution have been 
independently proposed by Protassov (2004) and Barndorff-Nielsen and Shep- 
hard (2005). Our approach is based on EM-type algorithms for fitting the multi- 
variate ¢ distribution with unknown degrees of freedom. A good starter reference 
on this subject is Liu and Rubin (1995), where the use of the MCECM algorithm 
of Meng and Rubin (1993) and the ECME algorithm proposed in Liu and Rubin 
(1994) is discussed. Further refinements of these algorithms are discussed in Liu 
(1997) and Meng and van Dyk (1997). 
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3.3 Spherical and Elliptical Distributions 


In the previous section we observed that elliptical distributions—in particular the 
multivariate t and symmetric multivariate NIG—provided far superior models to 
the multivariate normal for daily and weekly US stock-return data. The more gen- 
eral asymmetric versions of these distributions did not seem to offer much of an 
improvement on the symmetric models. While this was a single example, other 
investigations suggest that multivariate return data for groups of returns of a similar 
type often look roughly elliptical. 

In this section we look more closely at the theory of elliptical distributions. To do 
this we begin with the special case of spherical distributions. 


3.3.1 Spherical Distributions 


The spherical family constitutes a large class of distributions for random vectors 
with uncorrelated components and identical, symmetric marginal distributions. It is 
important to note that within this class, Na (0, I4) is the only model for a vector of 
mutually independent components. Many of the properties of elliptical distributions 
can best be understood by beginning with spherical distributions. 


Definition 3.18. A random vector X = (X1, ..., Xay has a spherical distribution 
if, for every orthogonal map U € R@*¢ (i.e. maps satisfying UU’ = U'U = Ia), 


px ox: 


Thus spherical random vectors are distributionally invariant under rotations. There 
are a number of different ways of defining distributions with this property, as we 
demonstrate below. 


Theorem 3.19. The following are equivalent. 
(1) X is spherical. 
(2) There exists a function y of a scalar variable such that, for allt € Ri, 
ox) = E(D = yN = wi HH). (3.39) 
(3) For everya € RI, 
a'X £ |lallX,, (3.40) 
where |a|? = a'a = aj +--+ +43. 
Proof. (1) = (2). If X is spherical, then for any orthogonal matrix U we have 
x(t) = pux (t) = E(e" VX) = px (U'®). 


This can only be true if ¢y (t) only depends on the length of t, i.e. if dy (t) = Y (t't) 
for some function w of a non-negative scalar variable. 
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(2) = (3). First observe that dy, (t) = E(e!*1) = py (te) = w(t?), where ei 
denotes the first unit vector in R@. It follows that for anya € RI, 


pax (®©) = px (ta) = Y (Pa'a) = Y Pal?) = bx, Cllall) = Paix: ©. 
(3) => (1). For any orthogonal matrix U we have 


pux(t) = E(UDX) = E(dlUtNX) = Eel) = E(X) = py (t). 


Part (2) of Theorem 3.19 shows that the characteristic function of a spherically 
distributed random vector is fully described by a function y of a scalar variable. For 
this reason w is known as the characteristic generator of the spherical distribution 
and the notation X ~ Sz(vy) is used. Part (3) of Theorem 3.19 shows that linear 
combinations of spherical random vectors always have a distribution of the same 
type, so that they have the same distribution up to changes of location and scale 
(see Section A.1.1). This important property will be used in Chapter 6 to prove the 
subadditivity of Value-at-Risk for linear portfolios of elliptically distributed risk 
factors. We now give examples of spherical distributions. 


Example 3.20 (multivariate normal). A random vector X with the standard uncor- 
related normal distribution Ng (0, I4) is clearly spherical. The characteristic function 
is 

x(t) = E(exp(it'X)) = exp(—30’t), 


so that, using part (2) of Theorem 3.19, X ~ Sa(y) with characteristic generator 
w(t) = exp(—31). 


Example 3.21 (normal variance mixtures). A random vector X with a standard- 
ized, uncorrelated normal variance mixture distribution M,(0, I4, H ) also has a 
spherical distribution. Using (3.21), we see that dx (t) = H Ge t), which obvi- 
ously satisfies (3.39), and the characteristic generator of the spherical distribution is 
related to the Laplace-—Stieltjes transform of the mixture distribution function of W 
by w(t) = Â (4t). Thus X ~ Mq(0, Ia, Â ()) and X ~ Sy(H(4-)) are two ways 
of writing the same mixture distribution. 


A further, extremely important way of characterizing spherical distributions is 
given by the following result. 


Theorem 3.22. X has a spherical distribution if and only if it has the stochastic 
representation 
X Ê RS, (3.41) 


where S is uniformly distributed on the unit sphere 8417! = {s € R? : s's = 1} and 
R > 0 is a radial rv, independent of S. 
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Proof. First we prove that if S is uniformly distributed on the unit sphere and R > 0 
is an independent scalar variable, then RS has a spherical distribution. This is seen 
by considering the characteristic function 


brs(t) = E(e'®"S) = E(E(e'R"S | R)). 


Since S is itself spherically distributed, its characteristic function has a characteristic 
generator, which is usually given the special notation (27. Thus, by Theorem 3.19 (2), 
we have that 


orst) = E(Qy(R7t't)) = / Rar t't dF(r), (3.42) 


where F is the df of R. Since this is a function of t't, it follows, again from Theo- 
rem 3.19 (2), that RS has a spherical distribution. 

We now prove that if the random vector X is spherical, then it has the represen- 
tation (3.41). For any arbitrary s € 417! the characteristic generator y of X must 
satisfy Y (t't) = dx (t) = x (lit |s). It follows that, if we introduce a random vector 
S that is uniformly distributed on the sphere 44—!, we can write 


Yt = f ex(lltlis) dFs(s) = / Ell’) d F(s). 
gd-l gd-1 


Interchanging the order of integration and using the 24 notation for the characteristic 
generator of S we have 


YEH = E(Qa(tl7 |X?) = J Qalt'tr?) dx), (3.43) 


where Fixy is the df of || X ||. By comparison with (3.42) we see that (3.43) is the 
characteristic function of RS, where R is an rv with df Fix, that is independent 
of S. 


We often exclude from consideration distributions which place point mass at 
the origin; that is we consider spherical rvs X in the subclass Si) for which 
P(X = 0) = 0. A particularly useful corollary of Theorem 3.22 is then the following 
result, which is used in Section 3.3.5 to devise tests for spherical and elliptical 
symmetry. 


Corollary 3.23. Suppose X È RS ~ St (y). Then 


(ux a) = (R, S) (3.44) 
X T S ' 


Proof. Let fi(x) = ||x|| and fo(x) = x/||x||. It follows from (3.41) that 


X 
(uxt. am) = (FX), fo(X) < (fi(RS), fo(RS)) = (R, 8). 
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Example 3.24 (working with R and S). Suppose X ~ Na(0, I4). Since X'X ~ 
X2» a chi-squared distribution with d degrees of freedom, it follows from (3.44) that 
2 2 
RE~ xj. 
We can use this fact to calculate E (S) and cov(S), the first two moments of a 
uniform distribution on the unit sphere. We have that 


0 = E(X) = E(R)E(S) > E(S) = 0, 
Ig = cov(X) = E(R*) cov(S) > cov($) = Ta/d, (3.45) 


since E(R*) = d when R? ~ ree 

Now suppose that X has a spherical normal variance mixture distribution X ~ 
M,(0, Ig, H ) and we wish to calculate the distribution of R? 2 X'’X in this case. 
Since X £ JV WY, where Y ~ Ng (0, Za) and W is independent of Y, it follows that 
R? a WR, where R2 ~ x and W and R are independent. If we can calculate the 
distribution of the product of W and an independent chi-squared variate, then we 
have the distribution of R?. 

For a concrete example suppose that X ~ tg(v, 0, I4). For a multivariate ¢ dis- 
tribution we know from Example 3.7 that W ~ Ig(5¥, $v), which means that 
v/W ~ xo Using the fact that the ratio of independent chi-squared distribu- 
tions divided by their degrees of freedom is F-distributed, it may be calculated 
that R? /d ~ F(d, v), the F distribution on d and v degrees of freedom (see Sec- 
tion A.2.3). Since an F (d, v) distribution has mean v/(v — 2), it follows from (3.45) 
that 


cov(X) = E(cov(RS | R)) = E(R7Iq/d) = (v/(v — 2)) Ig. 


The normal mixtures with u = 0 and X = J, represent an easily understood 
subgroup of the spherical distributions. There are other spherical distributions which 
cannot be represented as normal variance mixtures; an example is the distribution 
of the uniform vector S on $¢~! itself. However, the normal mixtures have a special 
role in the spherical world, as summarized by the following theorem. 


Theorem 3.25. Denote by Wæ the set of characteristic generators that generate a 
d-dimensional spherical distribution for arbitrary d > 1. Then X ~ Sq(wW) with 
y E€ Wy ifand only if X 4 ~ WZ, where Z ~ Na(0, Ig) is independent of W > 0. 


Proof. This is proved in Fang, Kotz and Ng (1987, pp. 48-51). 


Thus, the characteristic generators of normal mixtures generate spherical distri- 
butions in arbitrary dimensions, while other spherical generators may only be used 
in certain dimensions. A concrete example is given by the uniform distribution on 
the unit sphere. Let §2g denote the characteristic generator of the uniform vector 
S = (S],..., Say on 4-1. It can be shown that Ra ((t1,..., ta+1)' (tı, ..., td+1)) 
is not the characteristic function of a spherical distribution in R+! (for more details 
see Fang, Kotz and Ng (1987, pp. 70-72)). 
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If a spherical distribution has a density f, then, by using the inversion formula 


1 [0.6] [0,0] ody 
fo=o5 | -f e™ * py (t) dti «+ dta, 


it is easily inferred from Theorem 3.19 that f(x) = f(Ux) for any orthogonal 
matrix U, so that the density must be of the form 


f(x) = g(x'x) = ga +--+ +23) (3.46) 


for some function g of a scalar variable, which is referred to as the density generator. 
Clearly, the joint density is constant on hyperspheres {x : x? ++ a =c}inR?. 
To give a single example, the density generator of the multivariate t (i.e. the model 
X ~ ta(v, 0, I4) of Example 3.7) is 


PG +4) ( ~ 
T (4v) (av) l 


g(x) = 
v 


3.3.2 Elliptical Distributions 
Definition 3.26. X has an elliptical distribution if 


X Żu+AY, 


where Y ~ Sk(y) and A € R¢** and u € Rf are a matrix and vector of constants, 
respectively. 


In other words, elliptical distributions are obtained by multivariate affine trans- 
formations of spherical distributions. Since the characteristic function is 


ox (t) = E(eit’X) = E(eit’@+4Y)) = etH E (6 D'Y) = eH y(t’ Dt), 
where X = AA’, we denote the elliptical distributions by 


X ~ Eq(h, X, Y), 


and refer to m as the location vector, X as the dispersion matrix and y as the 
characteristic generator of the distribution. 


Remark 3.27. Knowledge of X does not uniquely determine its elliptical rep- 
resentation Eq(p, X, Y). Although mw is uniquely determined, X and w are only 
determined up to a positive constant. For example, the multivariate normal dis- 
tribution Ng(m, X) can be written as Eqg(m, X, Y()) or Eal, cd, w(-/c)) for 
y(u) = exp(— Ju) and any c > 0. Provided that variances are finite, then an ellipti- 
cal distribution is fully specified by its mean vector, covariance matrix and charac- 
teristic generator and it is possible to find an elliptical representation Ea (u, X, Y) 
such that X is the covariance matrix of X, although this is not always the standard 
representation of the distribution. 


We now give an alternative stochastic representation for the elliptical distributions 
that follows directly from Definition 3.26 and Theorem 3.22. 
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Proposition 3.28. X ~ Ea( n, X, Y) ifand only if there exist S, R and A satisfying 
X É u+ RAS, (3.47) 


with 
(i) S uniformly distributed on the unit sphere sl = {ys eR: s's= ly 
(ii) R > 0, a radial rv, independent of S, and 
(iii) A e R¢** with AA’ = D. 
For practical examples we are most interested in the case where X is positive 
definite. The relation between the elliptical and spherical cases is then clearly 


X ~ Eal, Z, Y) 4 XTP — pw) ~ Sap). (3.48) 


In this case, if the spherical vector Y has density generator g, then X = w+ X"? Y 


has density 
l I/51 
f(x) = peas — py X (x-a). 

The joint density is always constant on sets of the form {x : (x — py X =A (x —p) = 
c}, which are ellipsoids in R@. Clearly, the full family of multivariate normal variance 
mixtures with general location and dispersion parameters u and X are elliptical, 
since they are obtained by affine transformations of the spherical special cases 
considered in the previous section. 

It follows from (3.44) and (3.48) that for a non-singular elliptical variate X ~ 


Ea(u, X, Y) with no point mass at p we have 
ETA X — p) 
V(X = py E-1(X = p) 


where S is uniformly distributed on 87~! and R is an independent scalar rv. This 
forms the basis of a test of elliptical symmetry described in Section 3.3.5. 


(vaw H), )tR 9, (3.49) 


The following proposition shows that a particular conditional distribution of an 
elliptically distributed random vector X has the same correlation matrix as X and 
can also be used to test for elliptical symmetry. 


Proposition 3.29. Let X ~ Eqg(w, X, Y) and assume X is positive definite and 
cov(X) is finite. For any c > 0 such that P((X — py S-1(X — pf) >c) > 0 we 
have 

p(X | (X — py D7'(X — w) > c) = p(X). (3.50) 


Proof. It follows easily from (3.49) that 
X |X -aY I7X - p) >c wt RE'?S|R >c, 


where R g a, (X — p)/D—!(X — m) and S is independent of R and uniformly dis- 
tributed on 8¢—!. Thus we have 


X | (X - WY STX - p) > c È p+ R's, 


where R £ R | R? > c. It follows from Proposition 3.28 that the conditional distri- 
bution remains elliptical with dispersion matrix X and (3.50) holds. 
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3.3.3 Properties of Elliptical Distributions 


We now summarize some of the properties of elliptical distributions in a format that 
allows their comparison with the properties of multivariate normal distributions in 
Section 3.1.3. Many properties carry over directly and others need only be slightly 
modified. These parallels emphasize that it would be fairly easy to base many stan- 
dard procedures in risk management on an assumption that risk-factor changes have 
an approximately elliptical distribution, rather than the patently false assumption 
that they are multivariate normal. 


Linear combinations. If we take linear combinations of elliptical random vectors, 
then these remain elliptical with the same characteristic generator y. Let X ~ 
Ea(u, X, w) and take any B € R*‘*4 and b € R¢. Then it is easily shown, using a 
similar argument to that in Proposition 3.9, that 


BX+b~ Ey (Butb, BIB’, Y). (3.51) 
As a special case, if a € RI, then 


a'X ~ Ea'n, a' Xa, Y). (3.52) 


Marginal distributions. It follows from (3.52) that marginal distributions of X 
must be elliptical distributions with the same characteristic generator. Using the X = 
(X1, X2) notation from Section 3.1.1 and again extending this notation naturally 


to wand X: 
X X 
u= Hı ; y= 11 12 ; 
fo X21 222 
we have that X; ~ Ex(mi, X11, Y) and X2 ~ Eq—x¢(M2, X22, Y). 


Conditional distributions. The conditional distribution of X2 given X; may also 
be shown to be elliptical, although in general with a different characteristic generator 
W. For details of how the generator changes see Fang, Kotz and Ng (1987, pp. 45, 46). 
In the special case of multivariate normality the generator remains the same. 


Quadratic forms. If X ~ Eq(m, X, Y) with X non-singular, then we observed 
in (3.49) that 

Q:=(X— py EX — p) SR’, (3.53) 
where R is the radial rv in the stochastic representation (3.41). As we have seen 
in Example 3.24, for some particular cases the distribution of R? is well known: if 
X ~ Na(u, X), then R? ~ x7; if X ~ ta(v, m, X), then R*/d ~ F(d, v). For all 
elliptical distributions Q must be independent of X7! (X — p)//O. 


Convolutions. The convolution of two independent elliptical vectors with the same 
dispersion matrix X is also elliptical. If X and Y are independent d-dimensional 
random vectors satisfying X ~ Eqg(m, X, Y) and Y ~ Eq(p, X, Y), then we may 
take the product of characteristic functions to show that 


X +Y ~ Ea(u + Ñ, X, Y), (3.54) 
where y(u) = w(u)W(u). 
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If the dispersion matrices of X and Y differ by more than a constant factor, then 
the convolution will not necessarily remain elliptical, even when the two generators 
y and wy are identical. 


3.3.4 Estimating Dispersion and Correlation 


Suppose we have risk-factor return data X1, ..., X,, that we believe come from some 
elliptical distribution Eg(u, X, w) with heavier tails than the multivariate normal. 
We recall from Remark 3.27 that the dispersion matrix X is not uniquely determined, 
but rather is only fixed up to a constant of proportionality; when covariances are 
finite the covariance matrix is proportional to X. 

In this section we consider briefly the problem of estimating the location param- 
eter x, a dispersion matrix X and the correlation matrix P, assuming finiteness of 
second moments. We could use the standard estimators of Section 3.1.2. Under an 
assumption of iid or uncorrelated vector observations we observed that X and S 
in (3.9) are unbiased estimators of the mean vector and covariance matrix, respec- 
tively. They will also be consistent under quite weak assumptions. However, this 
does not necessarily mean they are the best estimators of location and dispersion 
for any given finite sample of elliptical data. There are many alternative estimators 
that may be more efficient for heavy-tailed data and may enjoy better robustness 
properties for contaminated data. 

One strategy would be to fit a number of normal variance mixture models, such 
as the ¢f and normal inverse Gaussian, using the approach of Section 3.2.5. From 
the best-fitting model we would obtain an estimate of the mean vector and could 
easily calculate the implied estimates of the covariance and correlation matrices. In 
this section we give simpler, alternative methods that do not require a full fitting of 
a multivariate distribution; consult Notes and Comments for further references to 
robust dispersion estimation. 


M-estimators. Maronna’s M-estimators (Maronna 1976) of location and disper- 
sion are a relatively old idea in robust statistics, but they have the virtue of 
being particularly simple to implement. Let f and È denote estimates of the 
mean vector and dispersion matrix. Suppose for every observation X; we calcu- 
late D? = (X; — fp)’ Ê`! (X; — p). If we wanted to calculate improved estimates 
of location and dispersion, particularly for heavy-tailed data, it might be expected 
that this could be achieved by reducing the influence of observations for which D; 
is large, since these are the observations that might tend to distort the parameter 
estimates most. M-estimation uses decreasing weight functions wj : R? > Rt, 
j = 1,2, to downweight observations with large D; values. This can be turned 
into an iterative procedure that converges to so-called M-estimates of location and 
dispersion; the dispersion matrix estimate is in general a biased estimate of the true 
covariance matrix. 


Algorithm 3.30 (M-estimators of location and dispersion). 


(1) As starting estimates take al =X and SW = S, the standard estimators 
in (3.9). Set iteration count k = 1. 
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(2) Fori =1,...,n set D? = (X; — ÀY $H -1 (X; — pl), 
(3) Update the location estimate using 


ple =. vit wi (Dj) Xi 
pee w (Dj) í 
where w is a weight function, as discussed below. 


(4) Update the dispersion matrix estimate using 


A i 5 k 
ÈIU = YT wD (Xi — AYDA; — AY, 


i=l 
where w2 is a weight function. 


(5) Set k = k + 1 and repeat steps (2)-(4) until estimates converge. 


Popular choices for the weight functions w; and w2 are the decreasing functions 
w(x) = (d + v)/ (x? + v) = w2(x°), for some positive constant v. Interestingly, 
use of these weight functions in Algorithm 3.30 corresponds exactly to fitting a 
multivariate tg (v, p, X) distribution with known degrees of freedom v using the 
EM algorithm (see, for example, Meng and van Dyk 1997). 

There are many other possibilities for the weight functions. For example, the 
observations in the central part of the distribution could be given full weight and 
only the more outlying observations downweighted. This can be achieved by set- 
ting wı(x) = l for x < a, wi(x) = a/x for x > a, for some value a, and 
w(x?) = (wi(x))?. 


Correlation estimates via Kendall ’s tau. A method for estimating correlation that 
is particularly easy to carry out is based on Kendall’s rank correlation coefficient and 
will turn out to be related to a method for estimating the parameters of certain copulas 
in Chapter 5. The theoretical version of Kendall’s rank correlation (also known as 
Kendall’s tau) for two rvs X; and X2 is denoted p,(X 1, X2) and is defined formally 
in Section 5.2.2; it is shown in Proposition 5.37 that if (X1, X2) ~ E(u, X, Y), 
then 


2 
pr(X1, X2) = — aresin(o), (3.55) 


where p = 012/ (011022)!/? is the pseudo-correlation coefficient of the elliptical 
distribution, which is always defined (even when correlation coefficients are unde- 
fined because variances are infinite). This relationship can be inverted to provide a 
method for estimating p from data; we simply replace the left-hand side of (3.55) 
by the standard textbook estimator of Kendall’s tau, which is given in (5.50), to 
get an estimating equation that is solved for 6. This method estimates correlation 
by exploiting the geometry of an elliptical distribution and does not require us to 
estimate variances and covariances. 

The method can be used to estimate a correlation matrix of a higher-dimensional 
elliptical distribution, by applying the technique to each bivariate margin. This does, 
however, result in a matrix of pairwise correlation estimates that is not necessar- 
ily positive definite; this problem does not always arise and if it does, a matrix 
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Figure 3.5. For 3000 independent samples of size 90 from a bivariate ¢ distribution with 
three degrees of freedom and linear correlation 0.5: (a) the standard (Pearson) estimator of 
correlation; (b) the Kendall’s tau transform estimator. See Example 3.31 for commentary. 


adjustment method can be used, such as the eigenvalue method of Rousseeuw and 
Molenberghs (1993), which is given in Algorithm 5.55. 

Note that, to turn an estimate of a bivariate correlation matrix into a robust esti- 
mate of a dispersion matrix we could estimate the ratio of standard deviations 
à = (022/011)'/", for example by using a ratio of trimmed sample standard devi- 
ations; in other words, we leave out an equal number of outliers from each of the 


univariate datasets X1;,..., Xn; fori = 1,2 and calculate the sample standard 
deviations with the remaining observations. This would give us the estimate 
A 1 ip 
f= (; . $) . (3.56) 
Ap à 


Example 3.31 (efficient correlation estimation for heavy-tailed data). Suppose 
we calculate correlations of asset or risk-factor returns based on 90 days (somewhat 
more than three trading months) of data; it would seem that this ought to be enough 
data to allow us to accurately estimate the “true” underlying correlation under an 
assumption that we have identically distributed data for that period. 

Figure 3.5 displays the results of a simulation experiment where we have generated 
3000 bivariate samples of iid data from a f distribution with three degrees of freedom 
and correlation po = 0.5; this is a heavy-tailed elliptical distribution. The distribution 
of the values of the standard correlation coefficient (also known as the Pearson 
correlation coefficient) is not particularly closely concentrated around the true value 
and produces some very poor estimates for a number of samples. On the other hand 
the Kendall’s tau transform method produces estimates that are in general much 
closer to the true value, and thus provides a more efficient way of estimating p. 
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3.3.5 Testing for Elliptical Symmetry 


The general problem of this section is to test whether a sample of identically dis- 
tributed data vectors X;,..., X, has an elliptical distribution Eg (u, X, Y) for some 
h, X and generator y. In all of the methods we require estimates of u and X and 
these can be obtained using approaches discussed in Section 3.3.4, such as fitting 
t distributions, calculating M-estimates or perhaps using (3.56) in the bivariate case. 
We denote the estimates simply by fa and È. 

Generally in finance we cannot assume that the observations are of iid random 
vectors, but we assume that they at least have an identical distribution. Note that, 
even if the data were independent, the fact that we generally estimate u and X from 
the whole dataset would introduce dependence in the procedures that we describe 
below. 


Stable correlation estimates: an exploratory method. An easy exploratory graph- 
ical method can be based on Proposition 3.29. We could attempt to estimate 


PXIAX) ZA, hæ) = a- AET o Â) 


for various values of c > 0. We expect that for elliptically distributed data the 
estimates will remain roughly stable over a range of different c values. Of course 
the estimates of this correlation should again be calculated using some method that 
is more efficient than the standard correlation estimator for heavy-tailed data. The 
method is most natural as a bivariate method and in this case the correlation of 
X | h(X) = c can be estimated by applying the Kendall’s tau transform method to 
those data points X; which lie outside the ellipse defined by A(x) = c. In Figure 3.6 
we give an example with both simulated and real data, neither of which show any 
marked departure from the assumption of stable correlations. The method is of 
course exploratory and does not allow us to come to any formal conclusion. 


QQplots. The remaining methods that we describe rely on the link (3.48) between 
non-singular elliptical and spherical distributions. If u and X were known, then we 
would test for elliptical symmetry by testing the data {X 7! (X;—u) : i =1,...,n} 
for spherical symmetry. Replacing these parameters by estimates as above we con- 
sider whether the data 


Yi = D71/2(X; — f):i=1,...,n} (3.57) 


are consistent with a spherical distribution, while ignoring the effect of estimation 
error. 

Some graphical methods based on QQplots have been suggested by Li, Fang and 
Zhu (1997) and these are particularly useful for large d. These rely essentially on 
the following result. 


Lemma 3.32. Suppose that T (Y) is a statistic such that, almost surely, 
T(aY)=T(Y) _ foreverya > 0. (3.58) 


Then T (Y) has the same distribution for every spherical vector Y ~ S} (Y). 
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Figure 3.6. Correlations are estimated using the Kendall’s tau method for points lying 
outside ellipses of progressively larger size (as shown in (a) and (c)). (a), (b) Two thousand 
t-distributed data with four degrees of freedom and p = 0.5. (c), (d) Two thousand daily 
log-returns on Microsoft and Intel. Dashed lines and points show estimates for an ellipse 
that is allowed to grow until there are only 40 points outside; dotted lines show estimates of 
correlation for all data. 


Proof. From Theorem 3.22 we have T (Y) E T (RS) and T (RS) = T(S) follows 
from (3.58). Since the distribution of T (Y) only depends on S and not R it must be 
the same for all Y ~ S$ (Y). 


We exploit this result by looking for statistics T (Y ) with the property (3.58) whose 
distribution we know when Y ~ N4(0, Iz). Two examples are 


1/25% E £ 
nY) = —— EER 
d y F 
Vaid — py DL - %)? i=l (3.59) 
Daur 
Lar 


For Y ~ Na(0, I4), and hence for Y ~ St (Wh), we have 7;(Y) ~ tg—1 and 
T(Y) ~ Beta(5k, 5(d — k)). 

Our experience suggests that the beta-plot is the more revealing of the resulting 
QQplots. Li, Fang and Zhu (1997) suggest choosing k such that it is roughly equal to 
d — k. In Figure 3.7 we show examples of the QQplots obtained for 2000 simulated 
data from a 10-dimensional ¢ distribution with four degrees of freedom and for 
the daily, weekly and monthly return data on 10 Dow Jones 30 stocks analysed 
in Example 3.3 and Section 3.2.5. The curvature in the plots for daily and weekly 
returns seems to be evidence against the elliptical hypothesis. 


T (Y) = 
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Figure 3.7. QQplots of the beta-statistic (3.59) for four datasets with dimension d = 10; 
we have set k = 5. (a) Two thousand simulated observations from ¢ distribution with four 
degrees of freedom. (b) Daily, (c) weekly and (d) monthly returns on Dow Jones stocks as 
analysed in Example 3.3 and Section 3.2.5. Daily and weekly returns show evidence against 
elliptical symmetry. 


Numerical tests. We restrict ourselves to simple ideas for bivariate tests; references 
to more general test ideas are found in Notes and Comments. If we neglect the 
error involved in estimating location and dispersion, testing for elliptical symmetry 
amounts to testing the Y; data in (3.57) for spherical symmetry. For i = 1,...,n, 
if we set R; = ||Y;|| and $; = Y;/||Y; ||, then under the null hypothesis the S; data 
should be uniformly distributed on the unit sphere 4 4-1 and the paired data (Rj, S;) 
should form realizations of independent pairs. 

In the bivariate case, testing for uniformity on the unit circle 4! amounts to a 
univariate test of uniformity on [0, 27] for the angles ©; described by the points 
S; = (cos @;, sin @;)’ on the perimeter of the circle; equivalently, we may test the 
data {U; := ©0;/(Q7) : i = 1,...,n} for uniformity on [0, 1]. Neglecting issues 
of serial dependence in the data, this may be done, for instance, by a standard chi- 
squared goodness-of-fit test (see Rice 1995, p. 241) or a Kolmogorov—Smirnov test 
(see Conover 1999). Again neglecting issues of serial dependence, the independence 
of the components of the pairs {(R;, Ui) : i = 1,...,n} could be examined by 
performing a test of association with Spearman’s rank correlation coefficient (see, 
for example, Conover 1999, pp. 312-328). 

We have performed these tests for the two datasets used in Figure 3.6, these being 
2000 simulated bivariate ¢ data with four degrees of freedom and 2000 daily log- 
returns for Intel and Microsoft. In Figure 3.8 the transformed data on the unit circle 
S; and the implied angles U; on the [0, 1] scale are shown; the dispersion matrices 
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Figure 3.8. Illustration of the transformation of bivariate data to points on the unit circle 
s! using the transformation $; = Y;/||Y;||, where the Y; data are defined in (3.57); the 
angles of these points are then transformed on to the [0, 1] scale, where they can be tested for 
uniformity. (a) Two thousand simulated t data with four degrees of freedom. (b) Two thousand 
Intel and Microsoft log-returns. Neither show strong evidence against elliptical symmetry. 


have been estimated using the construction (3.56) based on Kendall’s tau. Neither 
of these datasets shows significant evidence against the elliptical hypothesis. For the 
bivariate t data the p-values for the chi-squared and Kolmogorov—Smirnov tests of 
uniformity and the Spearman’s rank test of association are, respectively, 0.99, 0.90 
and 0.10. For the stock-return data they are 0.08, 0.12 and 0.19. Note that simulated 
data from lightly skewed members of the generalized hyperbolic family often do 
fail these tests. 


Notes and Comments 


A comprehensive reference for spherical and elliptical distributions is Fang, Kotz and 
Ng (1987); we have based our brief presentation of the theory on this account. Other 
references for the theory are Kelker (1970), Cambanis, Huang and Simons (1981) 
and Bingham and Kiesel (2002), the latter in the context of financial modelling. The 
original reference for Theorem 3.22 is Schoenberg (1938). Frahm (2004) suggests 
a generalization of the elliptical class to allow asymmetric models while preserving 
many of the attractive properties of the elliptical distributions. 

There is a vast literature on alternative estimators of dispersion and correlation 
matrices, particularly with regard to better robustness properties. Textbooks with rel- 
evant sections include Huber (1981), Hampel et al. (1986), Marazzi (1993), Wilcox 
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(1997) and Dell’ Aquila and Ronchetti (2005); the latter book is recommended in 
general for applications of robust statistics in econometrics and finance. 

We have concentrated on M-estimation of dispersion matrices, since this is related 
to the maximum likelihood estimation of alternative elliptical models. M-estimators 
have a relatively long history and are known to have good local robustness prop- 
erties (insensitivity to small data perturbations); they do, however, have relatively 
low breakdown points in high dimensions, so their performance can be poor under 
larger contaminations of the data. A small selection of papers on M-estimation 
includes Maronna (1976), Devlin, Gnanadesikan and Kettenring (1975, 1981) and 
Tyler (1983, 1987); see also Frahm (2004), in which an interesting alternative deriva- 
tion of a Tyler estimator is given. The method based on Kendall’s tau was suggested 
in Lindskog, McNeil and Schmock (2003). 

The QQplots for testing spherical symmetry were suggested by Li, Fang and Zhu 
(1997). There is a large literature on tests of spherical symmetry, including Smith 
(1977), Kariya and Eaton (1977), Beran (1979) and Baringhaus (1991). This work 
is also related to tests of uniformity for directional data: see Mardia (1972), Giné 
(1975) and Prentice (1978). 


3.4 Dimension Reduction Techniques 


The techniques of dimension reduction, such as factor modelling and principal com- 
ponents, are central to multivariate statistical analysis and are widely used in econo- 
metric model building. In the high-dimensional world of financial risk management 
they are essential tools. For this reason, and also because we will build on the tech- 
niques in some of the multivariate time series models described in Chapter 4, we 
include a concise summary of the more important information. For further read- 
ing and more detail it will be necessary to consult references listed in Notes and 
Comments. 


3.4.1 Factor Models 


By using a factor model we attempt to explain the randomness in the components 
of a d-dimensional vector X in terms of a smaller set of common factors. If the 
components of X represent, for example, equity returns, it is clear that a large part 
of their variation can be explained in terms of the variation of a smaller set of market 
index returns. Formally we define a factor model as follows. 


Definition 3.33 (linear factor model). The random vector X is said to follow a 
p-factor model if it can be decomposed as 


X=a+BF +e, (3.60) 
where 


Gi) F=(Fi,..., Fp) is a random vector of common factors with p < d and a 
covariance matrix that is positive definite; 

(ii) e = (€1,..., €g)’ is a random vector of idiosyncratic error terms, which are 
uncorrelated and have mean zero; 
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(iii) B € R¢*? is a matrix of constant factor loadings and a € Rf is a vector of 
constants; and 


(iv) cov(F, €) = E((F — E(F))e’) = 0. 


The assumptions that the errors are uncorrelated with each other (ii) and also 
with the common factors (iv) are an important part of this definition. We do not in 
general require independence, only uncorrelatedness. However, if the vector X is 
multivariate normally distributed and follows the factor model in (3.60), then it is 
possible to find a version of the factor model where F and e are Gaussian and the 
errors can be assumed to be mutually independent and independent of the common 
factors. We elaborate on this assertion in Example 3.34 below. 

It follows from the basic assumptions that factor models imply a special structure 
for the covariance matrix X of X. If we denote the covariance matrix of F by 2 
and that of e by the diagonal matrix T, it follows that 


X =cov(X) = BRB’ + Y. (3.61) 


If the factor model holds, the common factors can always be transformed so that 
they are mean zero and orthogonal. By setting F* = 27! (F — E(F)) and B* = 
BR! wehavea representation of the factor model of the form X = 4+ B*F*+e, 
where u = E(X) as usual and X = B*(B*) + Y. 
Conversely, it can be shown that whenever a random vector X has a covariance 
matrix which satisfies 
X = BB' +Y (3.62) 


for some B € R?*P with rank(B) = p < d and diagonal matrix Y, then X 
has a factor-model representation for some p-dimensional factor vector F and d- 
dimensional error vector €. 


Example 3.34 (the equicorrelation model). Suppose X is a random vector with 
standardized margins (zero mean and unit variance) and an equicorrelation matrix; 
in other words, the correlation between each pair of components is equal to p > 0. 
This means that the covariance matrix X can be written as X = pJg+ (1 — p)la, 
where Jy is the d-dimensional square matrix of ones and J, is the identity matrix, 
so that X is obviously of the form (3.62) for the d-vector B = ,/p1. 

To find a factor decomposition of X take any zero-mean, unit-variance rv Y that 
is independent of X and define a single common factor F and errors € by 


_ vw x 


= X; — Y, ;= X;— F, 
Fod- & INTE pd) EER ER 


where we note that in this construction F also has mean zero and unit variance. 
Thus we have the factor decomposition X = BF + e and it may be verified by 
calculation that cov(F, €j) = 0 for all j and cov(e;, ek) = 0 when j # k, so 
that the requirements of Definition 3.33 are satisfied. A random vector with an 
equicorrelation matrix can be thought of as following a factor model with a single 
common factor. 
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Since we can take any Y, the factors and errors in this decomposition are non- 
unique. Consider the case where the vector X is Gaussian; it is most convenient 
to take Y to also be Gaussian, since in that case the common factor is normally 
distributed, the error vector is multivariate normally distributed, Y is independent 
of £j, for all j, and £j and ex are independent for j # k. Since var(e;) = 1 — p, it 
is most convenient to write the factor model implied by the equicorrelation model 
as 


Xj =J/oF+VJ1-pZ;, Fahd; (3.63) 
where F, Z1, ..., Zq are mutually independent standard Gaussian rvs. This model 
will be used in Section 8.3.5 in the context of modelling homogeneous credit port- 


folios. For the more general construction on which this example is based see Mardia, 
Kent and Bibby (1979, Exercise 9.2.2). 


3.4.2 Statistical Calibration Strategies 


Now assume that we have data X1,..., Xn € RI representing risk-factor returns. 
Each vector observation X, recorded at a time t is assumed to be generated by the 
factor model (3.60) for some common-factor vector F; and some error vector €z. 

There are a number of different approaches to the practical calibration of a factor 
model, depending on the situation and, in particular, on whether or not the factor is 
also observable or considered to be unobservable or latent. 

In an observable factor model we assume that appropriate factors for the return 
series in question have been identified in advance and data on these factors have 
been collected. A simple example would be a one-factor model where F),..., Fn 
are observations of the return on a market index and Xj,..., X, are individual 
equity returns to be explained in terms of the market return (a model known in 
econometrics as Sharpe’s single-index model). Fitting of the model (estimation of 
B and a) is accomplished by regression techniques, and is described in Section 3.4.3. 

In a latent factor model appropriate factors are themselves estimated from the 
data X1, ..., Xn. Here we envisage a situation where the X, represent returns for 
a set of disparate risk factors and it is not clear a priori what the best set of factors 
might be. There are two general strategies for finding factors. The first strategy, 
which is quite common in finance, is to use the method of principal components to 
construct factors. We note that the factors we obtain, while being explanatory in a 
statistical sense, may not have any obvious interpretation. 

In the second approach, classical statistical factor analysis, it is assumed that 
the data are identically distributed with a distribution whose covariance matrix has 
the factor structure (3.62). Various techniques are used to estimate B and Y and 
then these estimates are used in turn to construct factor data. We will not go into 
the details of this method further—they are found in standard texts on multivariate 
statistical analysis (see Notes and Comments). 

In the context of risk management, the goal of all approaches to factor models is 
to obtain factor data F, and loading matrices B (and the constant vector a where 
relevant). If this is achieved, we can then concentrate on modelling the distribution 
or dynamics of F|,..., Fa, which is a lower-dimensional problem than modelling 
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X\,..., Xn. The unobserved errors €1,..., €n are of secondary importance. In 
situations where we have many risk factors the risk embodied in the errors is partly 
mitigated by a diversification effect, whereas the risk embodied in the common 
factors remains. The following simple example gives an idea why this is the case. 


Example 3.35. We continue our analysis of the one-factor model in Example 3.34. 
Suppose the random vector X in that example represents the return on d different 
companies so that the rv Ziq) = (1/d) ee X ; can be thought of as the portfolio 
return for an equal investment in each of the companies. We calculate that 


d 
1 1 1 
Za = -IBF + -1'e = JPF +- Y ej. 
Ge F VPF +g 3 j 
The risk in the first term is not affected by increasing the size of the portfolio d, 
whereas the risk in the second term can be reduced. Suppose we measure risk by 
simply calculating variances; we get 


i= 
var(Z(a)) = p + — >p, d-o, 


so that the systematic factor is the main contributor to the risk in a large-portfolio 
situation. 


We now discuss in a little more detail the fitting of observable factor models 
by regression. The approach to factor models based on principal components is 
described in Section 3.4.4. Principal component analysis is covered there in some 
detail, since it is an important technique in its own right. 


3.4.3 Regression Analysis of Factor Models 


Two equivalent approaches may be used to estimate the model parameters. We write 
the model as 
X,=a+BF,+e, t=1,...,n, (3.64) 


where X, and F, are vectors of individual returns and factors (for example, index 
returns) at time ¢, and a and B are parameters to be estimated. In the first approach 
we perform d univariate regression analyses, one for each component of the indi- 
vidual return series. In the second approach we estimate all parameters in a single 
multivariate regression. 


Univariate regression. Writing X,,; for the observation at time f of instrument j 
we consider the univariate regression model 


Xj =aj +b + £j, t=1,...,n. 


This is known as a time series regression, since the responses X1,;,..., Xn, j forma 
univariate time series and the factors F}, ..., F, form a possibly multivariate time 
series. Without going into technical details or anticipating any of the time series 
material in Chapter 4 we simply remark that the parameters a; and b; are estimated 
using the standard ordinary least-squares (OLS) method found in all textbooks on 
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linear regression. To justify the use of the method and to derive statistical proper- 
ties of the method it is usually assumed that, conditional on the factors, the errors 
E],j> +++» En, j are identically distributed and serially uncorrelated. (This means they 
form what will be referred to in Chapter 4 as white noise.) 

The estimate â; obviously estimates the jth component of a, while bj is an 
estimate of the jth row of the matrix B. By performing a regression for each of the 
univariate time series X1,;,..., Xn,j for j = 1,...,d, we complete the estimation 
of the parameters a and B. 


Multivariate regression. To set the problem up as a multivariate linear-regression 
problem, we construct a number of large matrices: 


x, 1 F| ; el 

5 A a : 
Xj, 1 F; a 
—— — ee (p+1)xd -A 
nxd nx(p+1) nxd 


Each row of the data X corresponds to a vector observation at a fixed time point t, 
and each column corresponds to a univariate time series for one of the individual 
returns. The model (3.64) can then be expressed by the matrix equation 


X = FB) +E, (3.65) 

where B2 is the matrix of regression parameters to be estimated. 
If we assume that the unobserved error vectors €1, ..., €n comprising the rows of 
E are identically distributed and serially uncorrelated, conditional on Fi, ..., Fr, 


then the equation (3.65) defines a standard multivariate linear regression (see, for 
example, Mardia, Kent and Bibby (1979) for the standard assumptions). An estimate 
of B2 is obtained by multivariate OLS according to the formula 


Bo = (F'F)"'F'X. (3.66) 


The factor model is now essentially calibrated, since we have estimates for a 
and B. The model can now be critically examined with respect to the original con- 
ditions of Definition 3.33. Do the errors vectors €; come from a distribution with 
diagonal covariance matrix, and are they uncorrelated with the factors? 

To learn something about the errors, we can form the model residual matrix 
E=X-F Bo. Each row of this matrix contains an inferred value of an error vector 
ê, at a fixed point in time. Examination of the sample correlation matrix of these 
inferred error vectors will hopefully show that there is little remaining correlation in 
the errors (or at least much less than in the original data vectors X,). If this is the case, 
then the diagonal elements of the sample covariance matrix of the ê, could be taken 
as an estimator Y for T. It is sometimes of interest to form the covariance matrix 
implied by the factor model and compare this with the original sample covariance 
matrix S of the data. The implied covariance matrix is 


n 


A A 


AADA X 1 = = 
EP = BOB’ +7, where È = Xr, F)(F, — FY. 
n-—l1 


t=1 
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Table 3.6. The first line gives estimates of B for a multivariate regression model fitted to 
10 Dow Jones 30 stocks where the observed common factor is the return on the Dow Jones 
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30 index itself. The second row gives r? values for a univariate regression model for each 


individual time series. The next 10 lines of the table give the sample correlation matrix of the 
data R, while the middle 10 lines give the correlation matrix implied by the factor model. 
The final 10 lines show the estimated correlation matrix of the residuals from the regression 
model, with entries less than 0.1 in absolute value being omitted. See Example 3.36 for full 


details. 

MO KO EK HWP INTC MSFT IBM MCD WMT DIS 
B 0.87 1.01 0.77 1.12 1.12 1.11 1.07 0.86 1.02 1.03 
r2 0.17 0.33 0.14 0.18 0.17 0.21 0.22 0.23 0.24 0.26 
MO 1.00 0.27 0.14 0.17 0.16 0.25 0.18 0.22 0.16 0.22 
KO 0.27 1.00 0.17 0.22 0.21 0.25 0.18 0.36 0.33 0.32 
EK 0.14 0.17 1.00 0.17 0.17 0.18 0.15 0.14 0.17 0.16 
HWP 0.17 0.22 0.17 1.00 0.42 0.38 0.36 0.20 0.22 0.23 
INTC 0.16 0.21 0.17 0.42 1.00 0.53 0.36 0.19 0.22 0.21 
MSFT 0.25 0.25 0.18 0.38 0.53 1.00 0.33 0.22 0.28 0.26 
IBM 0.18 0.18 0.15 0.36 0.36 0.33 1.00 0.20 0.20 0.20 
MCD 0.22 0.36 0.14 0.20 0.19 0.22 0.20 1.00 0.26 0.26 
WMT 0.16 0.33 0.17 0.22 0.22 0.28 0.20 0.26 1.00 0.28 
DIS 0.22 0.32 0.16 0.23 0.21 0.26 0.20 0.26 0.28 1.00 
MO 1.00 0.24 0.16 0.18 0.17 0.19 0.20 0.20 0.20 0.21 
KO 0.24 1.00 0.22 0.24 0.23 0.26 0.27 0.28 0.28 0.29 
EK 0.16 0.22 1.00 0.16 0.15 0.17 0.18 0.18 0.18 0.19 
HWP 0.18 0.24 0.16 1.00 0.17 0.19 0.20 0.20 0.21 0.22 
INTC 0.17 0.23 0.15 0.17 1.00 0.19 0.19 0.19 0.20 0.21 
MSFT 0.19 0.26 0.17 0.19 0.19 1.00 0.22 0.22 0.22 0.23 
IBM 0.20 0.27 0.18 0.20 0.19 0.22 1.00 0.23 0.23 0.24 
MCD 0.20 0.28 0.18 0.20 0.19 0.22 0.23 1.00 0.23 0.24 
WMT 0.20 0.28 0.18 0.21 0.20 0.22 0.23 0.23 1.00 0.25 
DIS 0.21 0.29 0.19 0.22 0.21 0.23 0.24 0.24 0.25 1.00 
MO 1.00 
KO 1.00 —0.12 0.12 
EK 1.00 
HWP 1.00 0.30 0.24 0.20 
INTC 0.30 1.00 0.43 0.20 
MSFT 0.24 0.43 1.00 0.14 
IBM —0.12 0.20 0.20 0.14 1.00 
MCD 0.12 1.00 
WMT 
DIS 1.00 


We would hope that £P captured much of the structure of S and that the correlation 
matrix R® := © (£®) captured much of the structure of the sample correlation 
matrix R = (S). 
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Example 3.36 (single-index model for Dow Jones 30 returns). As a simple exam- 
ple of the regression approach to fitting factor models we have fitted a single factor 
model to a set of 10 Dow Jones 30 daily stock-return series from 1992 to 1998. 
Note that these are different returns to those analysed in previous sections of this 
chapter. They have been chosen to be of two types: technology-related titles like 
Hewlett-Packard, Intel, Microsoft and IBM; and food- and consumer-related titles 
like Philip Morris, Coca-Cola, Eastman Kodak, McDonald’s, Wal-Mart and Disney. 
The factor chosen is the corresponding return on the Dow Jones 30 index itself. 

The estimate of B implied by formula (3.66) is shown in the first line of Table 3.6. 
The highest values of B correspond to so-called high beta stocks; since a one-factor 
model implies the relationship E(X;) = a; + B;E(F), these stocks potentially 
offer high expected returns relative to the market (but are often riskier titles); in 
this case the four technology-related stocks have the highest beta values. In the 
second row, values of r2, the so-called coefficient of determination, are given for 
each of the univariate regression models. This number measures the strength of the 
regression relationship between X ; and F and can be interpreted as the proportion 
of the variation of the stock return that is explained by variation in the market return; 
the highest r? corresponds to Coca-Cola (33%) and in general it seems that about 
20% of individual stock-return variation is explained by market-return variation. 

The next 10 lines of the table give the sample correlation matrix of the data R, 
while the middle 10 lines give the correlation matrix implied by the factor model 
(corresponding to ©), The latter matrix picks up much, but not all, of the structure 
of the former matrix. The final 10 lines show the estimated correlation matrix of the 
residuals from the regression model, but only those elements which exceed 0.1 in 
absolute value. The residuals are indeed much less correlated than the original data, 
but a few larger entries indicate imperfections in the factor-model representation 
of the data, particularly for the technology stocks. The index return for the broader 
market is clearly an important common factor but further systematic effects appear 
to be present in these data that are not captured by the index. 


3.4.4 Principal Component Analysis 


The aim of principal component analysis (PCA) is to reduce the dimensionality of 
highly correlated data by finding a small number of uncorrelated linear combinations 
that account for most of the variability of the original data, in some appropriately 
defined sense. PCA is not itself a model, but rather a data-rotation technique. How- 
ever, it can be used as a way of constructing appropriate factors for a factor model, 
and this is the main application we consider in this section. 

The key mathematical result behind the technique is the spectral decomposition 
theorem of linear algebra, which says that any symmetric matrix A € R¢*¢ can be 
written as 

A= TAI, (3.67) 


where 


(i) A = diag(àı,..., àq) is the diagonal matrix of eigenvalues of A which, 
without loss of generality, are ordered so that A) > à2 > --- > Ag, and 
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(ii) T is an orthogonal matrix satisfying TT” = I’ = Ig whose columns are 
standardized eigenvectors of A (i.e. eigenvectors with length 1). 


Theoretical principal components. Obviously we can apply this decomposition 
to any covariance matrix X, and in this case the positive semidefiniteness of X 
ensures that 4; > O for all j. Suppose the random vector X has mean vector w 
and covariance matrix X and we make the decomposition Y = I AT” as in (3.67). 
Then the principal components transform of X is defined to be 


Y = T'(X — p), (3.68) 


and can be thought of as a rotation and a recentring of X. The jth component of 
the rotated vector Y is known as the jth principal component of X and is given by 


Yj = y; (X — m), (3.69) 


where y; is the eigenvector of X corresponding to the jth ordered eigenvalue; this 
vector is also known as the jth vector of loadings. 
Simple calculations reveal that 


E(Y)=0 and cov(¥)= IST = P'TArT =4, 


so that the principal components of Y are uncorrelated and have variances 
var(Y;) = àj, Vj. The components are thus ordered by variance, from largest to 
smallest. Moreover, the first principal component can be shown to be the stan- 
dardized linear combination of X which has maximal variance among all such 
combinations; in other words, 


var(y; X) = max{var(a'X) : a'a = 1}. 


For j = 2,...,d, the jth principal component can be shown to be the standardized 
linear combination of X with maximal variance among all such linear combinations 
that are orthogonal to (and hence uncorrelated with) the first j — 1 linear combina- 
tions. The final dth principal component has minimum variance among standardized 
linear combinations of X. 

To measure the ability of the first few principal components to explain the vari- 
ability in X we observe that 


d d d 
X var(¥;) = Xij = trace( X) = X var(X;). 


If we interpret trace( X) = ee , var(X;) as a measure of the total variability in 
X, then, for k < d, the ratio he dj/ ee Àj represents the amount of this 
variability explained by the first k principal components. 


Sample principal components. Assume that we have multivariate data observations 
X ,..., Xn with identical distribution and unknown covariance matrix X, which 
we estimate by the sample covariance matrix 


1% : : 
Sx = =D (Xr -DA — XY. 


t=1 
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We apply the spectral decomposition (3.67) to the symmetric, positive semidefi- 
nite matrix Sy to get 
Sx = GLG, 


where G is the eigenvector matrix, L = diag(lı, ..., lą) is the diagonal matrix 
consisting of ordered eigenvalues, and we switch to roman letters to emphasize that 
these are now calculated from an empirical covariance matrix. 

The eigenvectors, or loading vectors, g; making up the columns of G define the 
empirical principal components transform and, by analogy with (3.69), the value 
Y;,; given by 

Yj = 85(X1 — X) 


can be considered to be an observation of the jth sample principal component at 


time t. The vectors Y; = (Y;.1,..., Y,a) = G'(X, — X) constitute rotations of the 
original data vectors X;. The rotated data vectors Yj,..., Y, have the property 
that their sample covariance matrix is L = diag(/),...,/q), the diagonal matrix of 


eigenvalues of Sy, as is easily verified: 


Š > a eani 
ya YY, a 


n 
= 22 G(X, — X)(X, —-X¥)G=G'S,G=L. 
t=1 
Thus the rotated vectors show no correlation between components and the compo- 
nents are ordered by their sample variances, from largest to smallest. 


Remark 3.37. In a situation where the different components of the data vectors 
X\,..., Xn have very different sample variances (particularly if they are measured 
on very different scales), it is to be expected that the component (or components) 
with largest variance will dominate the first loading vector gı and dominate the 
first principal component. In these situations the data are often transformed to have 
identical variances, which effectively means that principal components analysis 
is applied to the sample correlation matrix Rx. Note also that we could derive 
sample principal components from a robust estimate of the correlation matrix or a 
multivariate dispersion matrix. 


Principal components as factors. The principal components transform in (3.68) is 
invertible, giving us X = u+ (Y, where the random vector Y contains the principal 
components ordered from top to bottom by their variance. Let us suppose that we 
believe that the first k components explain the most important portion of the total 
variability of X. We could partition Y according to (Yj, Y$)’, where Yı € R* and 
Y € Ré-*, similarly, we could partition I” according to (11, 1), where I € Raxk 
and M> € R¢*@—-®), This yields the representation 


X=p+MY4+hYy =w+MNYite, (3.70) 


where IY, can be regarded as an error since its covariance matrix contains very 
small entries in comparison with the covariance matrix of IY}. 
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Figure 3.9. Barplot of the sample variances / ; of the first eight principal components; above 
each bar the cumulative proportion of the total variance explained by the components is given 
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Equation (3.70) is reminiscent of the factor model (3.60), except that the errors 
do not have a diagonal covariance matrix. Nevertheless, the principal components 
approach to constructing a factor model is generally to equate the first k principal 
components Y; with the factors F, to equate the matrix [ containing the first k 
eigenvectors with the factor loading matrix B, and to ignore the errors entirely. The 
factors are thus given by F; = (yj X,..., y XY. 


Example 3.38 (PCA-based factor model for Dow Jones 30 returns). We con- 
sider the data in Example 3.36 again. Principal components analysis is applied to 
the sample covariance matrix of the return data and the results are summarized in 
Figures 3.9 and 3.10. In the former we see a barplot of the sample variances of the 
first eight principal components /;; above each bar the cumulative proportion of 
the total variance explained by the components is given; the first two components 
explain almost 50% of the variation. In the latter plot the first two loading vectors 
gı and go are summarized. 

The first vector of loadings is positively weighted for all stocks and can be thought 
of as describing a kind of index portfolio; of course the weights in the loading 
vector do not sum to one, but they can be scaled to do so and this gives a so-called 
principal-component-mimicking portfolio. The second vector has positive weights 
for the consumer titles and negative weights for the technology titles; as a portfolio it 
can be thought of as prescribing a programme of short-selling of technology to buy 
the consumer titles. These first two sample principal components loadings vectors 
are used to define factors. 
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Figure 3.10. Barplot summarizing the loadings vectors gı and g2 defining the first two 
principal components: (a) factor 1 loadings; and (b) factor 2 loadings. 


In Table 3.7 the transpose of the matrix B (containing the loadings estimates in the 
factor model) is shown; the rows are merely the first two loadings vectors from the 
principal components analysis. In the third row, values of r”, the so-called coefficient 
of determination, are given for each of the univariate regression models and these 
indicate that more of the variation in the data is explained by the two PCA-based 
factors than was explained by the observed factor in Example 3.36; Intel returns 
seem to be best explained by the model. 

The next 10 lines give the correlation matrix implied by the factor model (corres- 
ponding to EP), Compared with the true sample correlation matrix in Example 3.36 
this seems to pick up more of the structure than did the correlation matrix implied 
by the observed factor model. 

The final 10 lines show the estimated correlation matrix of the residuals from the 
regression model, but only those elements which exceed 0.1 in absolute value. The 
residuals are again less correlated than the original data, but there are quite a number 
of larger entries, indicating imperfections in the factor-model representation of the 
data. In particular, we have introduced a number of larger negative correlations into 
the residuals; in practice, we seldom expect to find a factor model where the residuals 
have a covariance matrix that appears perfectly diagonal. 


Notes and Comments 


We have based our discussion of factor models, multivariate regression, statistical 
factor models and principal components on Mardia, Kent and Bibby (1979). Statis- 
tical approaches to factor models are also treated in Seber (1984) and Johnson and 
Wichern (2002). 
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Table 3.7. The first two lines give estimates of the transpose of B for a factor model fitted 
to 10 Dow Jones 30 stocks, where the factors are constructed from the first two sample 
principal components. The third row gives r? values for the univariate regression model for 
each individual time series. The next 10 lines give the correlation matrix implied by the 
factor model. The final 10 lines show the estimated correlation matrix of the residuals from 
the regression model, with entries less than 0.1 in absolute value omitted. See Example 3.38 
for full details. 


MO KO EK HWP INTC MSFT IBM MCD WMT DIS 


B’ 0.20 0.19 0.16 045 O51 044 0.32 0.18 0.24 0.22 
0.39 0.34 0.23 —0.26 —0.45 —0.10 —0.07 0.31 0.39 0.37 
r? 0.35 0.42 0.18 0.55 0.75 0.56 0.35 034 042 041 


MO 1.00 0.39 0.25 0.17 0.13 0.25 0.20 0.35 0.38 0.38 
KO 0.39 1.00 0.28 0.21 0.17 0.29 0.23 0.38 042 0.42 
EK 0.25 0.28 1.00 0.18 0.15 0.22 O18 0.25 0.28 0.27 
HWP 0.17 0.21 0.18 1.00 064 0.55 0.43 0.20 0.23 0.23 
INTC 0.13 0.17 0.15 064 1.00 061 0.48 016 0.19 0.18 
MSFT 0.25 0.29 0.22 0.55 0.61 1.00 044 0.27 0.31 0.30 
IBM 0.20 0.23 0.18 043 048 044 1.00 0.21 0.25 0.24 
MCD 0.35 0.38 0.25 0.20 0.16 0.27 0.21 1.000 0.38 0.37 
WMT 0.38 0.42 0.28 0.23 0.19 0.31 0.25 0.38 1.00 0.41 
DIS 0.38 0.42 0.27 0.23 0.18 030 0.24 037 £4041 1.00 


MO 1.00 —0.19 —0.15 —0.19 —0.37 —0.26 
KO —0.19 1.00 —0.15 0.11 —0.16 —0.17 
EK —0.15 —0.15 1.00 —0.15 —0.16 —0.16 
HWP 1.00 —0.63 —0.37 —0.14 
INTC 0.11 —0.63 1.00 —0.24 —0.31 
MSFT —0.37 —0.24 1.00 —0.22 
IBM —0.14 —0.31 —0.22 1.00 
MCD -—0.19 —0.15 1.00 —0.19 —0.19 
WMT -—0.37 —0.16 —0.16 —0.19 1.00 —0.23 
DIS —0.26 —0.17 —0.16 —0.19 —0.23 1.00 


We have simply spoken of observed and unobserved or latent factor models, but in 
the econometrics literature a classification of factor models into three types is more 
common; these are macroeconomic factor models, fundamental factor models and 
statistical factor models. In this categorization our observable factor model would be 
a macroeconomic factor model; index returns, along with other observables such as 
interest rates or inflation, are the kind of macroeconomic variables that are typically 
used as explanatory factors in such models. On the other hand, both approaches to 
calibrating a latent factor model (classical factor analysis and principal components) 
would fall under the heading of statistical factor models. 

The fundamental factor models, which are not described in this book, relate to 
the situation where factors are unobserved, but the loading matrix B is assumed to 
be known. More precisely we consider a situation where we have clear ideas of how 
to group returns by geographical or industrial sector, firm size or other important 
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characteristics. For example, the return of a European technology company might 
be expected to be explained by an unobserved factor representing the performance 
of such companies, or perhaps by two unobserved factors representing European 
and technology companies. Using data for many companies and regression methods 
it is then possible to estimate the unobserved factors using the known classification 
information. 

For a more detailed discussion of factor models see the paper by Connor (1995), 
which provides a comparison of the three types of model, and the book of Campbell, 
Lo and MacKinlay (1997). An excellent practical introduction to these models with 
examples in S-Plus is Zivot and Wang (2003). Other accounts of factor models and 
PCA in finance are found in Alexander (2001) and Tsay (2002). 


4 


Financial Time Series 


In this chapter we consider time series models for financial risk-factor change data, 
in particular differenced logarithmic price and exchange-rate series. We begin by 
looking more systematically at the empirical properties of such data in a discussion 
of so-called stylized facts. 

In Section 4.2 we review essential concepts in the analysis of time series, such as 
stationarity, autocorrelations and their estimation, white noise processes, and ARMA 
(autoregressive moving-average) processes. We then devote Section 4.3 to univari- 
ate ARCH and GARCH (generalized autoregressive conditionally heteroscedastic) 
processes for capturing the important phenomenon of volatility, before showing how 
such models are used in the context of quantitative risk management in Section 4.4. 
A short introduction to concepts in the analysis of multivariate time series, such as 
cross-correlation and multivariate white noise, is found in Section 4.5, while the 
final section presents multivariate GARCH-type models for multivariate risk-factor 
change series. 

Our focus on the GARCH paradigm in this chapter requires comment. As this 
book goes to press these models have been with us for around two decades, and 
modern econometrics and finance have continued to develop other kinds of model 
for financial return series. We think here of discrete-time stochastic volatility models, 
long-memory GARCH models, continuous-time models fitted to discrete data, and 
models based on realized volatility calculated from high-frequency data; none of 
these new developments are handled in this book. 

Our emphasis on GARCH has two main motivations, the first of these being 
practical. We recall that in risk management we are typically dealing with very 
large numbers of risk factors and our philosophy, expounded in Section 1.5, is that 
broad-brush techniques that capture the main risk features of many time series are 
more important than very detailed analyses of single series. The relatively simple 
GARCH model lends itself to this approach and proves very easy to fit. There are also 
some multivariate extensions which build in fairly simple ways on the univariate 
models and may be calibrated to a multivariate series in stages. This ease of use 
contrasts with other models where even the fitting of a single series presents a 
challenging computational problem (e.g. estimation of a stochastic volatility model 
via filtering or Gibbs sampling), and multivariate extensions have not been widely 
considered. Related to this is the likelihood that an average financial enterprise will 
collect, at best, daily data on its complete set of risk factors for the purposes of risk 
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management, and this will rule out some more sophisticated models that require 
higher-frequency data. 

Our second reason for concentrating on ARCH and GARCH models is didactic. 
These models for volatile return series have a status akin to ARMA models in classi- 
cal time series; they belong, in our opinion, to the body of standard methodology to 
which a student of the subject should be exposed. A quantitative risk manager who 
understands GARCH has a good basis for understanding more complex models and 
a good framework for talking about volatility in a rational way. He/she may also 
appreciate more clearly the role of more ad hoc volatility estimation methods such 
as the exponentially weighted moving-average (EWMA) procedure. 


4.1 Empirical Analyses of Financial Time Series 
4.1.1 Stylized Facts 


The stylized facts of financial time series are a collection of empirical observations, 
and inferences drawn from these observations, that seem to apply to the majority of 
daily series of risk-factor changes, such as log-returns on equities, indexes, exchange 
rates and commodity prices; these observations are now so entrenched in economet- 
ric experience that they have been elevated to the status of facts. They often continue 
to hold when we go to longer time intervals, such as weekly or monthly returns, or 
to shorter time intervals, such as intra-daily returns. A version of the stylized facts 
is as follows. 


(1) Return series are not iid although they show little serial correlation. 
(2) Series of absolute or squared returns show profound serial correlation. 
(3) Conditional expected returns are close to zero. 

(4) Volatility appears to vary over time. 

(5) Return series are leptokurtic or heavy-tailed. 


(6) Extreme returns appear in clusters. 


In the following consider a sample of daily return data X;,..., Xn and assume that 
these have been formed by logarithmic differencing of a price, index or exchange- 
rate series (S;)+=0,1,....n, SO that X; = In(S;/S;-1),f = 1,...,n. 


Volatility clustering. Evidence for the first two stylized facts is collected in Fig- 
ures 4.1 and 4.2. Figure 4.1(a) shows 2608 daily log-returns for the DAX index 
spanning a decade from 2 January 1985 to 30 December 1994, a period includ- 
ing both the stock-market crash of 1987 and the reunification of Germany. Parts (b) 
and (c) show series of simulated iid data from a normal and Student ¢ model, respec- 
tively; in both cases the model parameters have been set by fitting the model to the 
real return data using the method of maximum likelihood under the iid assumption. 
In the normal case this means that we simply simulate iid data with distribution 
N(u, 07), where u = X =n! $] X; and o? =n7! $L (X; — X)’. In the t 
case the likelihood has been maximized numerically and the estimated degrees of 
freedom parameter is v = 3.8. 
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Figure 4.1. (a) Log-returns for the DAX index from 2 January 1985 to 30 December 1994 
compared with simulated iid data from (b) a normal and (c) a ¢ distribution, where the 
parameters have been determined by fitting the models to the DAX data. 


The simulated normal data are clearly very different from the DAX return data 
and do not show the same range of extreme values, which does not surprise us given 
our observations on the inadequacy of the Gaussian model in Chapter 3. While the 
Student t model can generate comparable extreme values to the real data, more 
careful observation reveals that the DAX returns exhibit a phenomenon known as 
volatility clustering, which is not present in the simulated series. Volatility clustering 
is the tendency for extreme returns to be followed by other extreme returns, although 
not necessarily with the same sign. In the DAX data we see periods such as the stock- 
market crash of October 1987 or the political uncertainty in the period between late 
1989 and German reunification in 1990 which are marked by large positive and 
negative moves. 

In Figure 4.2 the correlograms of the raw data and the absolute data for all three 
datasets are shown. The correlogram is a graphical display for estimates of serial 
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Figure 4.2. Correlograms for (a) the three datasets in Figure 4.1 and (b) the absolute values 
of these data. Dashed lines mark the standard 95% confidence intervals for the autocorrelations 
of a process of iid finite-variance rvs. 


correlation, and its construction and interpretation are discussed in Section 4.2.3. 
While there is very little evidence of serial correlation in the raw data for all datasets, 
the absolute values of the real financial data appear to show evidence of serial depend- 
ence. Clearly, more than 5% of the estimated correlations lie outside the dashed lines, 
which are the 95% confidence intervals for a process of iid finite-variance rvs. This 
serial dependence in the absolute returns would be equally apparent in squared return 
values, and seems to confirm the presence of volatility clustering. We conclude that, 
although there is no evidence against the iid hypothesis for the genuinely iid data, 
there is strong evidence against the iid hypothesis for the DAX return data. 

Table 4.1 contains more evidence against the iid hypothesis for daily stock-return 
data. The Ljung—Box test of randomness (described in Section 4.2.3) has been 
performed for the stocks comprising the Dow Jones 30 index in the period 1993- 
2000. In the two columns for daily returns the test is applied, respectively, to the raw 
return data (LBraw) and their absolute values (LBabs), and p-values are tabulated; 
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Table 4.1. Tests of randomness for returns of 30 Dow Jones stocks in the eight-year period 
1993-2000. The columns LBraw and LBabs show p-values for Ljung—Box tests applied to 
the raw and absolute values, respectively. 


Daily Monthly 
a SSS 
Name Symbol LBraw LBabs_ LBraw  LBabs 
Alcoa AA 0.00 0.00 0.23 0.02 
American Express AXP 0.02 0.00 0.55 0.07 
AT&T T 0.11 0.00 0.70 0.02 
Boeing BA 0.03 0.00 0.90 0.17 
Caterpillar CAT 0.28 0.00 0.73 0.07 
Citigroup C 0.09 0.00 0.91 0.48 
Coco-Cola KO 0.00 0.00 0.50 0.03 
DuPont DD 0.03 0.00 0.75 0.00 
Eastman Kodak EK 0.15 0.00 0.61 0.54 
Exxon Mobil XOM 0.00 0.00 0.32 0.22 
General Electric GE 0.00 0.00 0.25 0.09 
General Motors GM 0.65 0.00 0.81 0.27 
Hewlett-Packard HWP 0.09 0.00 0.21 0.02 
Home Depot HD 0.00 0.00 0.00 0.41 
Honeywell HON 0.44 0.00 0.07 0.30 
Intel INTC 0.23 0.00 0.79 0.62 
IBM IBM 0.18 0.00 0.67 0.28 
International Paper IP 0.15 0.00 0.01 0.09 
JPMorgan JPM 0.52 0.00 0.43 0.12 
Johnson & Johnson JNJ 0.00 0.00 0.11 0.91 
McDonald’s MCD 0.28 0.00 0.72 0.68 
Merck MRK 0.05 0.00 0.53 0.65 
Microsoft MSFT 0.28 0.00 0.19 0.13 
3M MMM 0.00 0.00 0.57 0.33 
Philip Morris MO 0.01 0.00 0.68 0.82 
Proctor & Gamble PG 0.02 0.00 0.99 0.74 
SBC SBC 0.05 0.00 0.13 0.00 
United Technologies UTX 0.00 0.00 0.12 0.01 
Wal-Mart WMT 0.00 0.00 0.41 0.64 
Disney DIS 0.44 0.00 0.01 0.51 


these show strong evidence (particularly when applied to absolute values) against the 
iid hypothesis. If financial log-returns are not iid, then this contradicts the popular 
random-walk hypothesis for the discrete-time development of log-prices (or, in this 
case, index values). If log-returns are neither iid nor normal, then this contradicts 
the geometric Brownian motion hypothesis for the continuous-time development of 
prices on which the Black—Scholes—Merton pricing theory is based. 

Moreover, if there is serial dependence in financial return data, then the question 
arises: to what extent can this dependence be used to make predictions about the 
future? This is the subject of the third and fourth stylized facts. It is very difficult to 
predict the return in the next time period based on historical data alone. This can be 
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explained to some extent by the lack of serial correlation in the raw return series data. 
For some series we do see evidence of correlations at the first lag (or first few lags); 
for example, a small positive correlation at the first lag might suggest that there is 
some discernible tendency for a return with a particular sign (positive or negative) 
to be followed in the next period by a return with the same sign. However, this is 
not apparent in the DAX data, which suggests that our best guess for tomorrow’s 
return based on our observations up to today is zero. This idea is expressed in the 
assertion of the third stylized fact, that conditional expected returns are close to 
zero. 

Volatility is often formally modelled as the conditional standard deviation of 
financial returns given historical information and, although the conditional expected 
returns are consistently close to zero, the presence of volatility clustering suggests 
that conditional standard deviations are continually changing in a partly predictable 
manner. If we know that returns have been large in the last few days, due to mar- 
ket excitement, then there is reason to believe that the distribution from which 
tomorrow’s return is “drawn” should have a large variance. It is this idea that 
lies behind the time series models for changing volatility that we will examine 
in Section 4.3. 


Tails and extremal behaviour. We have already observed in Chapter 3 that the 
normal distribution is a poor model for daily and longer-interval returns (whether 
univariate or multivariate). The Jarque—Bera test, which is based on empirical skew- 
ness and kurtosis measures given in (3.15), clearly rejects the normal hypothesis 
(see Example 3.3). In particular, daily financial return data appear to have a much 
higher kurtosis than is consistent with the normal distribution; their distribution is 
said to be leptokurtic, meaning that it is more narrow in the centre but has longer 
and heavier tails than the normal distribution. 

Further empirical analysis often suggests that the distribution of daily or other 
short-interval financial return data has tails that decay slowly according to a power 
law, rather than the faster, exponential-type decay of the tails of anormal distribution. 
This means that we tend to see rather more extreme values than might be expected in 
such return data; we discuss this phenomenon further in Chapter 7, which is devoted 
to extreme value theory (EVT). 

A further observation about extremes is, however, pertinent to our discussion 
of serial dependence in financial return data. In the discussion of Figure 4.1 we 
remarked that there is a tendency for the extreme values in return series to occur in 
close succession in volatility clusters; further evidence for this phenomenon is given 
in Figure 4.3, where the time series of the 100 largest daily losses for the DAX returns 
and the 100 largest values for the simulated ¢ data are plotted. In Section 7.4.1 of 
Chapter 7 we summarize theory which suggests that the very largest values in iid data 
will occur like events in a Poisson process, separated by waiting times that are iid with 
an exponential distribution. Parts (b) and (d) of Figure 4.3 show QQplots of these 
waiting times against an exponential reference distribution. While the hypothesis of 
the Poisson occurrence of extreme values for the iid data is supported, there are too 
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Figure 4.3. Time series plots of the 100 largest negative values for (a) the DAX returns 
and (c) the simulated r data as well as (b), (d) QQplots of the waiting times between these 
extreme values against an exponential reference distribution. 


many short waiting times caused by the clustering of extreme values in the DAX 
data to support the exponential hypothesis; this constitutes further evidence against 
the iid hypothesis for return data. 


Longer-interval return series. As we progressively increase the interval of the 
returns by moving from daily to weekly, monthly, quarterly and yearly data, the 
phenomena we have identified tend to become less pronounced. Volatility clustering 
decreases and returns begin to look both more iid and less heavy-tailed. 

Suppose we begin with a sample of n returns measured at some time interval 
(say daily or weekly) and aggregate these to form longer-interval log-returns. The 
h-period log-return at time f is given by 


(hy S: ) ( Sı Sas) _ 
X; =ln = ln tee = Xij, (4.1) 
St-1 St—h 3 va 
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and from our original sample we can form a sample of non-overlapping h-period 
returns i : t = h,2h,...,[n/h]h}, where [-] denotes the integer part. Due 
to the sum structure of the h-period returns, it is to be expected that some central 
limit effect takes place, whereby their distribution becomes less leptokurtic and 
more normal as h is increased. Note that, although we have cast doubt on the iid 
model for daily data, a central limit theorem applies to many stationary time series 
processes, including the GARCH models that are a focus of this chapter. 

In Table 4.1 the Ljung—Box tests of randomness have also been applied to non- 
overlapping monthly return data. For 20 out of 30 stocks the null hypothesis of iid 
data is not rejected at the 5% level in Ljung—Box tests applied to both the raw and 
absolute returns. Thus it is harder to find evidence of serial dependence in such 
monthly returns. 

Aggregating data to form non-overlapping h-period returns reduces the sample 
size from n ton/h, and for longer-period returns (such as quarterly or yearly returns) 
this may be a very serious reduction in the amount of data. An alternative in this 
case is to form overlapping returns. For 1 < k < h, a general recipe for forming 
aggregated h-period returns (overlapping or non-overlapping) is to form 


h—1 
ka = X-j:t=h,h+k,h+2k,...,h+ [n = nytt}: (4.2) 
j=0 


this would give (1 + [(n — h)/k]) values that overlap by an amount (h — k). In 
forming overlapping returns we can preserve a large number of data, but we do 
build additional serial dependence into the data. Even if the original data were iid, 
overlapping data would be dependent. 


4.1.2 Multivariate Stylized Facts 


Inrisk management we are seldom interested in single financial time series, but rather 
with multiple series of financial risk-factor changes. The stylized facts identified in 
Section 4.1.1 may be augmented by a number of stylized facts of a multivariate 
nature. 

We now consider multivariate return data X1,..., X,. Each component series 
X1,j,-+-,Xn,j for j =1,...,d is a series formed by logarithmic differencing of a 
daily price, index or exchange-rate series as before. We consider the following set 
of multivariate stylized facts. 


(M1) Multivariate return series show little evidence of cross-correlation, except for 
contemporaneous returns. 


(M2) Multivariate series of absolute returns show profound evidence of cross- 
correlation. 


(M3) Correlations between series (i.e. between contemporaneous returns) vary over 
time. 


(M4) Extreme returns in one series often coincide with extreme returns in several 
other series. 
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The first two observations are fairly obvious extensions of univariate stylized facts 
(1) and (2) from Section 4.1.1. Just as the stock returns for, say, General Motors 
on days t and t + h (for h > 0) show very little serial correlation, so we generally 
detect very little correlation between the General Motors return on day f and, say, 
the Coca-Cola return on day t + h. Of course, stock returns on the same day may 
show non-negligible correlation, due to factors that affect the whole market on that 
day. When we look at absolute returns we should bear in mind that periods of high 
or low volatility are generally common to more than one stock. Thus returns of 
large magnitude in one stock may tend to be followed on subsequent days by further 
returns of large magnitude for both that stock and other stocks, which would explain 
(M2). The issue of cross-correlation and its estimation will be addressed with an 
example in Section 4.5. 

Stylized fact (M3) is a multivariate counterpart to univariate observation (4): 
that volatility appears to vary with time. While the latter appears “obvious” to the 
naked eye from illustrations such as Figure 4.1, the multivariate observation is less 
straightforward to demonstrate. In fact, although it is widely believed that corre- 
lations change, there are various ways of interpreting this stylized fact in terms of 
underlying models. We may believe that correlations are constant within regimes but 
that there is evidence of relatively frequent regime changes. Or we may believe that 
correlation changes continually and dynamically like volatility. Just as volatility is 
often formally modelled as the conditional standard deviation of returns given his- 
torical information, we can also devise models that feature a conditional correlation 
that is allowed to change dynamically. We consider some models of this kind in 
Section 4.6. 

The only way to collect evidence for (M3) and to decide in what way correlation 
changes is to fit different models for changing correlation and then to make formal 
statistical comparisons of the models. More ad hoc attempts to demonstrate (M3) 
should generally be avoided. For example, it is not sufficient to calculate correlations 
between two daily series for monthly samples and to observe that these values may 
vary greatly from month to month; there is considerable error in estimating correla- 
tions from small samples, particularly when the underlying distribution is something 
more like a heavy-tailed multivariate ¢ distribution than a Gaussian distribution (see 
also Example 3.31 in this context). 

Stylized fact (M4) is encountered in other forms; one often hears the view that 
“correlations go to one in times of market stress”. The idea this observation attempts 
to express is that in volatile periods the level of dependence between, for example, 
various stock returns appears to be higher. Consider, for example, Figure 4.4, which 
shows the BMW and Siemens log-return series the same 1985-1994 time period as 
in Figure 4.1. In both the time series plots and the scatterplot three days on which 
large negative returns occurred for both stocks have been marked with a number; all 
of these days occurred during periods of volatility on the German market. They are, 
respectively, 19 October 1987, Black Monday on Wall Street; 16 October 1989, when 
over 100 000 Germans protested against the East German regime in Leipzig during 
the chain of events that led to the fall of the Berlin Wall and German reunification; and 
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Figure 4.4. Three extreme days on which German stock returns (in this case (a) BMW and 
(b) Siemens) showed large negative returns. The dates are 19 October 1987, 16 October 1989 
and 19 August 1991; see text for historical commentary. 


19 August 1991, the day of the coup by communist hardliners during the reforming 
era of Gorbachev in the USSR. Clearly, the returns on these extreme days are lined 
up on the diagonal of the scatterplot in the lower-left-hand corner and it is easy to 
see why one might describe these as occasions on which correlations tend to one. 

While it may be partly true that useful multivariate time series models for returns 
should have the property that conditional correlations tend to become large when 
volatilities are large, the phenomenon of simultaneous extreme values can also be 
addressed by choosing distributions in multivariate models that allow so-called tail 
or extremal dependence; a mathematical definition of this notion and a discussion 
of its importance may be found in Section 5.2.3 of the chapter on copulas. 


Notes and Comments 


A number of texts contain extensive empirical analyses of financial return series and 
discussions of their properties. We mention in particular Taylor (1986), Alexander 
(2001), Tsay (2002) and Zivot and Wang (2003). For more discussion of the random- 
walk hypothesis for stock returns and its shortcomings see Lo and MacKinlay (1999). 


4.2 Fundamentals of Time Series Analysis 


This section provides a short summary of the essentials of classical univariate time 
series analysis with a focus on that which is relevant for modelling risk-factor return 
series. We have based the presentation on Brockwell and Davis (1991, 2002), so 
these texts may be used as supplementary reading. 


4.2.1 Basic Definitions 


A time series model for a single risk factor is a stochastic process (X;);¢z, i.e. a fam- 
ily of rvs, indexed by the integers and defined on some probability space (2, F, P). 
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Moments of a time series. Assuming they exist, we define the mean function u(t) 
and autocovariance function y (t, S) of (Xt);ez by 


u(t) = E(X;), teZ, 
y(t, s) = E(X: — uO) (Xs — ws), t,s EZ. 


It follows that the autocovariance function satisfies y (t, s) = y (s, t) for all t, s, and 
y(t, t) = var(X;). 


Stationarity. Generally, the processes we consider will be stationary in one or both 
of the following two senses. 


Definition 4.1 (strict stationarity). The time series (X;)rez is strictly stationary if 


d 
(Xa, a) X) = (Xn 4k, sey Xt,+k)s 
for all t},...,t,,k € Zand for all n € N. 


Definition 4.2 (covariance stationarity). The time series (X;)rez is covariance 
stationary (or weakly or second-order stationary) if the first two moments exist and 
satisfy 


w(t) =H, te Z, 
yit,s)=y(@t+k,s+k), t,s,k eZ. 


Both these definitions attempt to formalize the notion that the behaviour of a time 
series is similar in any epoch in which we might observe it. Systematic changes in 
mean, variance or the covariances between equally spaced observations are incon- 
sistent with stationarity. 

It may be easily verified that a strictly stationary time series with finite variance 
is covariance stationary, but it is important to note that we may define infinite- 
variance processes (including certain ARCH and GARCH processes) which are 
strictly stationary but not covariance stationary. 


Autocorrelation in stationary time series. The definition of covariance stationarity 
implies that for all s, t we have y(t — s, 0) = y(t, s) = y (s, t) = y (s — t, 0), so 
that the covariance between X; and Xs only depends on their temporal separation 
|s — t|, which is known as the lag. Thus, for a covariance-stationary process we 
write the autocovariance function as a function of one variable: 


y(h):=y(h,0), Whe Z. 
Noting that y (0) = var(X;), Vt, we can now define the autocorrelation function of 


a covariance-stationary process. 


Definition 4.3 (autocorrelation function). The autocorrelation function (ACF) 
p(h) of a covariance-stationary process (X;)rez is 


p(h) = p(Xn, Xo) = y (h)/y 0),  Vh eZ. 
We speak of the autocorrelation or serial correlation p(h) at lag h. In classical time 
series analysis the set of serial correlations and their empirical analogues estimated 


from data are the objects of principal interest. The study of autocorrelations is known 
as analysis in the time domain. 
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White noise processes. The basic building blocks for creating useful time series 
models are stationary processes without serial correlation, known as white noise 
processes and defined as follows. 


Definition 4.4 (white noise). (X;);<z is a white noise process if it is covariance 
stationary with autocorrelation function 
1, h=0, 
p(h) = 
0, AAO. 
A white noise process centred to have mean zero with variance o? = var(X;) 
will be denoted WN(0, 7). A simple example of a white noise process is a series 
of iid rvs with finite variance, and this is known as a strict white noise process. 


Definition 4.5 (strict white noise). (X;),<z is a strict white noise process if it is a 
series of iid, finite-variance rvs. 


A strict white noise (SWN) process centred to have mean zero and variance o? 


will be denoted SWN (0, o7). Although SWN is the easiest kind of noise process to 
understand, it is not the only noise that we will use. We will later see that covariance- 
stationary ARCH and GARCH processes are in fact white noise processes. 


Martingale difference. One further noise concept that we use, particularly when we 
come to discuss volatility and GARCH processes, is that of a martingale-difference 
sequence. To discuss this concept we further assume that the time series (X;)rez 
is adapted to some filtration (Ft)rez which represents the accrual of information 
over time. The sigma algebra F; represents the available information at time t and 
typically this will be the information contained in past and present values of the time 
series itself (X;);<;, which we refer to as the history up to time t and denote by 
F, = o({Xs5 : s <S t}); the corresponding filtration is known as the natural filtration. 

In a martingale-difference sequence the expectation of the next value, given cur- 
rent information, is always zero, and we have observed in Section 4.1.1 that this 
property may be appropriate for financial return data. A martingale difference is 
often said to model our winnings in consecutive rounds of a fair game. 


Definition 4.6 (martingale difference). The time series (X;),¢z is known as a 
martingale-difference sequence with respect to the filtration (¥;);<z if E|X1| < œœ, 
X; is F;-measurable (adapted) and 


E(X; | F;-1) =0, VWteZ. 
Obviously the unconditional mean of such a process is also zero: 
E(X) = E(E(X; | Fi-1)) = 0, Vt € Z. 
Moreover, if E(X 2 < œ for all ż, then autocovariances satisfy 
y(t,s) = E(X: Xs) 
— J ECE(KXs | Fs-1)) = E(X1E(Xs | Fs-1)) = 0, t<s, 
[EEGs | Fi-1)) = E(XE(X; | Fi) =0, t>s. 
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Thus a finite-variance martingale-difference sequence has zero mean and zero 
covariance. If the variance is constant for all ¢, it is a white noise process. 


4.2.2 ARMA Processes 


The family of classical ARMA processes are widely used in many traditional appli- 
cations of time series analysis. They are covariance-stationary processes that are 
constructed using white noise as a basic building block. As a general notational 
convention in this section and the remainder of the chapter we will denote white 
noise by (€;);cz and strict white noise by (Z;) +z. 


Definition 4.7 (ARMA process). Let (€;);<7 be WN(0, 62): The process (X; )rez is 
a zero-mean ARMA (p, q) process if it is a covariance-stationary process satisfying 
difference equations of the form 


Xt — 1 Xt-1 — -++ — ỌpXt-p = € +O €;-1 +++» + Og Er—g, Vt EZ. (4.3) 


(X+) is an ARMA process with mean u if the centred series (X; — u)rez is a zero- 
mean ARMA (p, q) process. 


Note that, according to our definition, there is no such thing as a non-covariance- 
stationary ARMA process. Whether the process is strictly stationary or not will 
depend on the exact nature of the driving white noise, also known as the process 
of innovations. If the innovations are iid, or themselves form a strictly stationary 
process, then the ARMA process will also be strictly stationary. 

For all practical purposes we can restrict our study of ARMA processes to causal 
ARMA processes. By this we mean processes satisfying the equations (4.3) which 
have a representation of the form 


[0.0] 
X=} Vitni, (4.4) 
= 
where the y; are coefficients which must satisfy 
[0.0] 
XO lil < 00. (4.5) 
i=0 


Remark 4.8. The so-called absolute summability condition (4.5) is a technical 
condition which ensures that E|X;| < oo. This guarantees that the infinite sum 
in (4.4) converges absolutely, almost surely, meaning that both YS 6 Iwi ller; | and 
Spears Wié;—i are finite with probability one (see Brockwell and Davis 1991, Propo- 
sition 3.1.1). 


We now verify by direct calculation that causal ARMA processes are indeed 
covariance stationary and calculate the form of their autocorrelation function before 
going on to look at some simple standard examples. 


Proposition 4.9. Any process satisfying (4.4) and (4.5) is covariance stationary 
with an autocorrelation function given by 


Do Vi Wihl 
Dco Y? 


p(h) = > hez. (4.6) 
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Proof. Obviously, for all t we have E(X;) = 0 and var(X;) = o2 pia y? < ©, 
due to (4.5). Moreover, the autocovariances are given by 


OO [0.0] 
cov(X;, Xi4n) = E(X: Xt+r) = T3 PiEr—i 5 Z) 
i=0 j=0 


Since (€+) is white noise, it follows that E(€;-j€:4n-j) # 0 4> j = i + h, and 
hence that 


(oe) 
y(h) = cov(Xr, Xith) = 9; ) Wiis, hez, (4.7) 
i=0 


which depends only on the lag h and not on t. The autocorrelation function follows 
easily. 


Example 4.10 (MA(q) process). It is clear that a pure moving-average process 
q 
Xp =) ieri +£ (4.8) 
i=l 


forms a simple example of a causal process of the form (4.4). It is easily inferred 
from (4.6) that the autocorrelation function is given by 
—|h 
Sofa iOi 


pth) = SS ble 1,-2.4), 
i=0 9; 


where 6) = 1. For |h| > q we have p(h) = 0 and the autocorrelation function is 
said to cut off at lag q. If this feature is observed in the estimated autocorrelations 
of empirical data, it is often taken as an indicator of moving-average behaviour. A 
realization of an MA(4) process together with the theoretical form of its ACF is 
shown in Figure 4.5. 


Example 4.11 (AR(1) process). The first-order AR process satisfies the set of 
difference equations 
Xt = PX1-1 + Er, Vt. (4.9) 


This process is causal if and only if |@| < 1, and this may be understood intuitively 
by iterating the equation (4.9) to get 


Xt = b(PX1-2 + €r-1) + €r-2 
k 


= ot! X-k- + X geri. 


i=0 
Using more careful probabilistic arguments it may be shown that the condition 


|@| < 1 ensures that the first term disappears as k — oo and the second term 
converges. The process 


[0,0] 
X=) ib ei (4.10) 
i=0 
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Figure 4.5. A number of simulated ARMA processes with their autocorrelation func- 
tions (dashed) and correlograms. Innovations are Gaussian. (a) AR(1), @ = 0.8. (b) MA(4), 
6 = —0.8, 0.4, 0.2, —0.3. (c) ARMA (1, 1), ọ = 0.6, 0 = 0.5. 


turns out to be the unique solution of the defining equations (4.9). It may be easily 
verified that this is a process of the form (4.4) and that ae lli = (1 — |l)! 
so that (4.5) is satisfied. Looking at the form of the solution (4.10), we see that the 
AR(1) process can be represented as an MA (00) process: an infinite-order moving- 
average process. 

The autocovariance and autocorrelation functions of the process may be calculated 
from (4.7) and (4.6) to be 


o'o? 
1— ¢?’ 
Thus the ACF is exponentially decaying with possibly alternating sign. A realization 
of an AR(1) process together with the theoretical form of its ACF is shown in 
Figure 4.5. 


y(h) = poh) =o", hez. 
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Remarks on general ARMA theory. In the case of the general ARMA process of 
Definition 4.7, the issue of whether this process has a causal representation of the 
form (4.4) is resolved by the study of two polynomials in the complex plane, which 
are given in terms of the ARMA model parameters by 


$z) =1—g1z— ++ — pz”, 
Öz) = 14+ O12 +--+ + 0924. 

Provided that $(z) and ĝ (z) have no common roots, then the ARMA process is a 
causal process satisfying (4.4) and (4.5) if and only if (z) has no roots in the unit 
circle |z| < 1. The coefficients yy; in the representation (4.4) are determined by the 
equation 


= . A(z) 
i : = >, 1. 
a Gy e 


Example 4.12 (ARMA(1, 1) process). For the process given by 
X;—OXi-1 = & + 08-1, Vee Z, 


the complex polynomials are (z) = 1 — ġz and 6 (z) = 1+ 8z and these have no 
common roots provided ¢ + 6 Æ 0. The solution of ¢(z) = 0 is z = 1/ and this 
is outside the unit circle provided |¢| < 1, so that this is the condition for causality 
(as in the AR(1) model of Example 4.11). 

The representation (4.4) can be obtained by considering 


eg + Oz 
Divi = igp T CTD tetet lel < I, 
i=0 


and is easily calculated to be 


X= 6 +(P+0) > peni. (4.11) 


i=l 
Using (4.6) we may calculate that for h 4 0 the ACF is 
h= phie +O + 68) 
PNU = — TF0 +200 


A realization of an ARMA (1, 1) process together with the theoretical form of its 
ACF is shown in Figure 4.5. 


Invertibility. Equation (4.11) shows how the ARMA (1, 1) process may be thought 
of as an MA(oo) process. In fact, if we impose the condition |0| < 1, we can also 
express (X+) as the AR(oo) process given by 


CO 
X, =e + ($ +0) (ON TX. (4.12) 

i=1 
If we rearrange this to be an equation for £+, then we see that we can, in a sense, 
“reconstruct” the latest innovation €; from the entire history of the process (Xs)s<r. 
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The condition |0| < 1 is known as an invertibility condition, and for the general 
ARMA(p, q) process the invertibility condition is that (z) should have no roots 
in the unit circle |z| < 1. In practice, the models we fit to real data will be both 
invertible and causal solutions of the ARMA-defining equations. 


Models for the conditional mean. Consider a general invertible ARMA model with 
non-zero mean. For what comes later it will be useful to observe that we can write 
such models as 


P q 

Xp m+s Mb = K+) GXi - MW +) Ojej (413) 
i=l j=l 

Since we have assumed invertibility, the terms ¢;_;, and hence uz, can be written in 

terms of the infinite past of the process up to time t — 1; us is said to be measurable 

with respect to F;_; =o ({X5:5 <St— 1). 

If we make the assumption that the white noise (€;),<z is a martingale-difference 
sequence (see Definition 4.6) with respect to (F;);ez, then E(X; | F-1) = ur. 
In other words, such an ARMA process can be thought of as putting a particular 
structure on the conditional mean ju; of the process. ARCH and GARCH processes 
will later be seen to put structure on the conditional variance var(X; | F;—1). 


ARIMA models. In traditional time series analysis we often consider an even larger 
class of model known as ARIMA, or autoregressive integrated moving-average 
models. Let V denote the difference operator, so that for a time series process 
(%)rez we have VY; = Y, — Y;_1. Denote repeated differencing by VŽ, where 


VY,, d=1, 


VY = 
t |vé-lvy,) = Vly, — Yi), d>1. 


(4.14) 


The time series (Y;) is said to be an ARIMA (p, d, q) process if the differenced 
series (X;) given by X; = V“Y; is an ARMA(p, q) process. For d > 1 ARIMA 
processes are non-stationary processes. They are popular in practice because the 
operation of differencing (once or more than once) can turn a dataset that is obviously 
“non-stationary” into a dataset that might plausibly be modelled by a stationary 
ARMA process. For example, if we use an ARMA(p, q) process to model daily 
log-returns of some price series (S;), then we are really saying that the original 
logarithmic price series (In S;) follows an ARIMA(p, 1, q) model. 

When the word integrated is used in the context of time series it generally implies 
that we are looking at a non-stationary process that might be made stationary by 
differencing; see also the discussion of IGARCH models in Section 4.3.2. 


4.2.3 Analysis in the Time Domain 


We now assume that we have a random sample Xj,..., Xn from a covariance- 
stationary time series model (X;);-z. Analysis in the time domain involves calculat- 
ing empirical estimates of autocovariances and autocorrelations from this random 
sample and using these estimates to make inference about the serial dependence 
structure of the underlying process. 
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Correlogram. The sample autocovariances are calculated according to 
l n—h 
Ph) = — Xar- XX- X), OSA <n, 
n 
t=1 
where X = >", Xz/n is the sample mean, which estimates jx, the mean of the 
time series. From these we calculate the sample ACF: 


ph) = y(h)/PO), O<h<n. 


The correlogram is the plot {(h, 6(h)) : h = 0, 1, 2,...}, whichis designed to facili- 
tate the interpretation of the sample ACF. Correlograms for various simulated ARMA 
processes are shown in Figure 4.5; note that the estimated correlations correspond 
reasonably closely to the theoretical ACF for these particular realizations. 

To interpret such estimators of serial correlation, we need to know something 
about their behaviour for particular time series. The following general result is for 
causal linear processes, which are processes of the form (4.4) driven by strict white 
noise. 


Theorem 4.13. Let (X;),<z be the linear process 


[0,6] [0.6] 
X- u= 2 WiZi-i, where = IVi] < œ, (Zr)rez ~ SWN(O, 03). 
i=0 i=0 


Suppose that either E(Z}) < œ 0r RA < oo. Then, forh € {1,2,...}, we 
have 2 
J/n(p(h) — p(h)) > Nr (0, W), 


where 


Êh) = ÊC), ..., P(A)’, 
p(h) = (e(1),.-., Ph)’, 


and W has elements 


oe) 


Wij = YS \(ok +i) + p(k —i) — 2p) pk) (PA + J) + plk- jf) — 20) p(k). 
k=1 


Proof. This follows as a special case of a result in Brockwell and Davis (1991, 
pp. 221-223). 


The condition )°°, iw? < oo holds for ARMA processes, so ARMA processes 
driven by SWN fall under the scope of this theorem (regardless of whether fourth 
moments exist for the innovations or not). 

Trivially, the theorem also applies to SWN itself. For SWN we have 


nÊ) + NiO, I), 


so for sufficiently large n the sample autocorrelations of data from an SWN 
process will behave like iid normal observations with mean zero and variance 
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1/n. Ninety-five per cent of the estimated correlations should lie in the interval 
(—1.96/./n, 1.96/,/n), and it is for this reason that correlograms are drawn with 
confidence bands at these values. If more than 5% of estimated correlations lie out- 
side these bounds, then this is considered as evidence against the null hypothesis 
that the data are strict white noise. 


Remark 4.14. In light of the discussion of the asymptotic behaviour of sample 
autocorrelations for SWN it might be asked how these estimators behave for white 
noise. However, this is an extremely general question because white noise encom- 
passes a variety of possible underlying processes (including the standard ARCH 
and GARCH processes we later address) which only share second-order properties 
(finiteness of variance and lack of serial correlation). In some cases the standard 
Gaussian confidence bands apply; in some cases they do not. Fora GARCH process 
the critical issue turns out to be the heaviness of the tail of the stationary distribution 
(see Mikosch and Starica 2000, for more details). 


Portmanteau tests. Itis often useful to combine the visual analysis of the correlo- 
gram with a formal numerical test of the strict white noise hypothesis, and a popular 
test is that of Ljung and Box, as applied in Section 4.1.1. Under the null hypothesis 
of SWN, the statistic 

pli)” 
mG 


h 
Orn =n(n+2) >> 


j=l 


has an asymptotic chi-squared distribution with h degrees of freedom. This statistic 
is generally preferred to the simpler Box—Pierce statistic Opp = oa êY, 
which also has an asymptotic xÈ distribution under the null hypothesis, although 
the chi-squared approximation may not be so good in smaller samples. These tests 
are the most commonly applied portmanteau tests. 

If a series of rvs forms an SWN process, then the series of absolute or squared 
variables must also be iid. It is a good idea to also apply the correlogram and Ljung- 
Box tests to absolute values as a further test of the SWN hypothesis. We prefer to 
perform tests of the SWN hypothesis on the absolute values rather than the squared 
values because the squared series is only an SWN (according to the definition we 
use) when the underlying series has a finite fourth moment. Daily log-return data 
often point to models with an infinite fourth moment 


4.2.4 Statistical Analysis of Time Series 


In practice, the statistical analysis of time series data X1, ..., X, follows a pro- 
gramme consisting of the following stages. 


Preliminary analysis. The data are plotted and the plausibility of a single station- 
ary model is considered. Since we concentrate here on differenced logarithmic value 
series, we will assume that at most minor preliminary manipulation of our data is 
required. Classical time series analysis has many techniques for removing trends 
and seasonalities from “non-stationary” data; these techniques are discussed in all 
standard texts including Brockwell and Davis (2002) and Chatfield (1996). While 


4.2. Fundamentals of Time Series Analysis 135 


certain kinds of financial time series certainly do show seasonal patterns, such as 
earnings time series, we will assume that such effects are relatively minor in the 
kinds of daily or weekly return series that are the basis of risk-management meth- 
ods. If we were to base our risk management on high-frequency data, preliminary 
cleaning would be more of an issue, since these show clear diurnal cycles and other 
deterministic features (see Dacorogna et al. 2001). 

Obviously the assumption of stationarity becomes more questionable if we take 
long data windows, or if we choose windows in which well-known economic policy 
shifts have taken place. Although the markets change constantly there will always be 
a tension between our desire to use the most up-to-date data and our need to include 
enough data to have precision in statistical estimation. Whether half a year of data, 
one year, five years or 10 years are appropriate will depend on the situation. It is 
certainly a good idea to perform a number of analyses with different data windows 
and to investigate the sensitivity of statistical inference to the amount of data. 


Analysis in the time domain. Having settled on the data, the techniques of Sec- 
tion 4.2.3 come into play. By applying correlograms and portmanteau tests such as 
Ljung—Box to both the raw data and their absolute values, the SWN hypothesis is 
evaluated. If it cannot be rejected for the data in question, then the formal time series 
analysis is over and simple distributional fitting could be used instead of dynamic 
modelling. 

For daily risk-factor return series we expect to quickly reject the SWN hypothesis. 
Despite the fact that correlograms of the raw data may show little evidence of serial 
correlation, correlograms of the absolute data are likely to show evidence of strong 
serial dependence. In other words the data may support a white noise model but not 
a strict white noise. In this case ARMA modelling is not required, but the volatility 
models of Section 4.3 may be useful. 

If the correlogram does provide evidence of the kind of serial correlation patterns 
produced by ARMA processes, then we can attempt to fit ARMA processes to data. 


Model fitting. A traditional approach to model fitting first attempts to identify the 
order of a suitable ARMA process using the correlogram and a further tool known 
as the partial correlogram (not described in this book but found in all standard texts). 
For example, the presence of a cut-off at lag q in the correlogram (see Example 4.10) 
is taken as a diagnostic for pure moving-average behaviour of order q (and simi- 
lar behaviour in a partial correlogram indicates pure AR behaviour). With modern 
computing power it is now quite easy to simply fit a variety of MA, AR and ARMA 
models and to use a model-selection criterion like that of Akaike (described in Sec- 
tion A.3.6) to choose the “best” model. There are also automated model choice 
procedures such as the method of Tsay and Tiao (1984). 

Sometimes there are a priori reasons for expecting certain kinds of model to 
be most appropriate. For example, suppose we analyse longer-period returns that 
overlap, as in Equation 4.2. Consider the case where the raw data are daily returns 
and we build weekly returns. In (4.2) we set h = 5 (to get weekly returns) andk = 1 
(to get as much data as possible). Assuming that the underlying data are genuinely 
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from a white noise process (X;)rez ~ WN(0, o°), the weekly aggregated returns 
at times ¢ and t + l satisfy 


4 4 2 
(5— Do4, 1=0,...,4 
cov(X®, xO) = cov OS Xt-i; > Xas) = 
i=0 j=0 0, l > 5, 


so that the overlapping returns have the correlation structure of an MA (4) process, 
and this would be a natural choice of time series model for them. 

Having chosen the model to fit, there are a number of possible fitting methods, 
including specialized methods for AR processes, such as Yule-Walker, that make 
minimal assumptions concerning the distribution of the white noise innovations; 
we refer to the standard time series literature for more details. In Section 4.3.4 we 
discuss the method of (conditional) maximum likelihood, which may be used to fit 
ARMA models with (or without) GARCH errors to data. 


Residual analysis and model comparison. Recall the representation of a causal 
and invertible ARMA process in (4.13) and suppose we have fitted such a process 
and estimated the parameters ¢; and @;. The residuals are inferred values ê, for the 
unobserved innovations ¢; and they are calculated recursively from the data and 
fitted model by 


P q 
ê= X- ûn M= A+ DOG i- AAY jy (415) 


where the values jz; are sometimes known as the fitted values. Obviously, we have a 
problem calculating the first few values of ê; due to the finiteness of our data sample 
and the infinite nature of the recursions (4.15). One of many possible solutions might 
be to set 941 = 8912 =---=é 9 = 0 and X-p+1 = X-pp2 =: = X0 = X 
and then to use (4.15) for t = 1,..., n. Since the first few values will be influenced 
by these starting values, they might be ignored in later analyses. 


The residuals (€;) should behave like a realization of a white noise process, 
since this is our model assumption for the innovations, and this can be assessed by 
constructing their correlogram. If there is still evidence of serial correlation in the 
correlogram, then this suggests that a good ARMA model has not yet been found. 
Moreover, we can use portmanteau tests to test formally that the residuals behave 
like a realization of a strict white noise process. If the residuals behave like SWN, 
then no further time series modelling is required; if they behave like WN but not 
SWN, then the volatility models of Section 4.3 may be required. 

It is usually possible to find more than one reasonable ARMA model for the 
data, and formal model-comparison techniques may be required to decide on an 
overall best model or models. The Akaike model-selection criterion described in 
Section A.3.6 might be used, or one of a number of variants on this criterion which 
are often preferred for time series (see Brockwell and Davis 2002, Section 5.5.2). 


4.2.5 Prediction 


There are many approaches to the forecasting or prediction of time series and we 
summarize two which extend easily to the case of GARCH models. The first strategy 
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makes use of fitted ARMA (or ARIMA) models and is sometimes called the Box— 
Jenkins approach (Box and Jenkins 1970). The second strategy is a model-free 
approach to forecasting known as exponential smoothing, which is related to the 
exponentially weighted moving-average technique for predicting volatility. 


Prediction using ARMA models. Consider the invertible ARMA model and its 
representation in (4.13). Let F, denote the history of the process up to and includ- 
ing time ¢ as before and assume that the innovations (¢;);<z have the martingale- 
difference property with respect to (F;);ez. 

For the prediction problem it will be convenient to denote our sample of n data 
by X+—-n+1,..., X;. We assume these are realizations of rvs following a particular 
ARMA model. Our aim is to predict X;+1ı or more generally X;+;, and we denote 
our prediction by P;X;,;. The method we describe assumes that we have access 
to the infinite history of the process up to time ¢ and derives a formula that is then 
approximated for our finite sample. 

As a predictor of X;4 we use the conditional expectation E(X;+; | ¥;). Among 
all predictions P;X;+, based on the infinite history of the process up to time f, this 
predictor minimizes the mean squared prediction error E((Xj+ — P;X1+n)*). 

The basic idea is that, for h > 1, the prediction E(X;+4, | Ft) is recursively 
evaluated in terms of E(X;+4;~—1 | Ft). We use the fact that E (e+p | Ft) = O (the 
martingale-difference property of innovations) and that the rvs (Xs)s<: and (€5)s<z 
are “known” at time t. The assumption of invertibility (4.12) ensures that the inno- 
vation £; can be written as a function of the infinite history of the process (Xs)s<t. 
To illustrate the approach it will suffice to consider an ARMA(1, 1) model, the 
generalization to ARMA(p, q) models following easily. 


Example 4.15 (prediction for the ARMA(1, 1) model). Suppose an ARMA(1, 1) 
model of the form (4.13) has been fitted to the data and its parameters u, @ and 0 
have been determined. Our one-step prediction for X;+1 is 


E(Xi41 | Fi) = hii = M+ O(X — u) + Er, 
since E(€;41 | F+) = 0. For a two-step prediction we get 
E(X142 | Fi) = E (trs | Fr) = w+ OE Xi | Fi) — w) 
= u +(X: — u) + oe, 
and in general we have 
E(Xi4n | F) = u +(X — u) +9"! 08. 


Without knowing all historical values of (X;)s<; this predictor cannot be eval- 
uated exactly, but it can be accurately approximated if n is reasonably large. The 
easiest way of doing this is to substitute the model residual ê; calculated from (4.15) 
for e,. Note that limp_,oo E(X;+4n | Fi) = u, almost surely, so that the prediction 
converges to the estimate of the unconditional mean of the process for longer time 
horizons. 
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Exponential smoothing. This is a popular technique which is used for both pre- 
diction of time series and trend estimation. Here we do not necessarily assume that 
the data come from a stationary model, although we do assume that there is no 
deterministic seasonal component in the model. In general the method is less well 
suited to return series with frequently changing signs and is better suited to undiffer- 
enced price or value series. It forms the basis of a very common method of volatility 
prediction (see Section 4.4.1). 

Suppose our data represent realizations of rvs Y;-n41,..., Y;, considered without 
reference to any concrete parametric model. As a forecast for Y;1 we use a prediction 
of the form 

n—l1 
P, Y+ = Xaa — œ) Y, i, where 0 < a < 1. 
i=0 
Thus we weight the data from most recent to most distant with a sequence of expo- 
nentially decreasing weights that sum to almost one. It is easily calculated that 


nol n—2 
P, Y1 = Xaa —a)'Y,_; =Y, + (1-a) Xaa — a) ¥ 4; 
i=0 j=0 
=Y, + (1 — æ)P,—1Y;, (4.16) 


so that the prediction at time ¢ is obtained from the prediction at time t — 1 by 
a simple recursive scheme. The choice of œ is subjective; the larger the value the 
more weight is put on the most recent observation. Empirical validation studies with 
different datasets can be used to determine a value of «œ that gives good results. 

Note that, although the method is commonly seen as a model-free forecasting 
technique, it can be shown to be the natural prediction method based on conditional 
expectation for a non-stationary ARIMA (0, 1, 1) model. 


Notes and Comments 


There are many texts covering the subject of classical time series analysis including 
Box and Jenkins (1970), Priestley (1981), Abraham and Ledolter (1983), Brockwell 
and Davis (1991, 2002), Hamilton (1994) and Chatfield (1996). Our account of basic 
concepts, ARMA models and analysis in the time domain closely follows Brockwell 
and Davis (1991), which should be consulted for the rigorous background to ideas we 
can only summarize. We have not discussed analysis of time series in the frequency 
domain, which is less common for financial time series; for this subject see, again, 
Brockwell and Davis (1991) or Priestley (1981). 

For more on tests of the strict white noise hypothesis (that is tests of randomness), 
see Brockwell and Davis (2002). Original references for the Box—Pierce and Ljung- 
Box tests are Box and Pierce (1970) and Ljung and Box (1978). 

There is a vast literature on forecasting and prediction in linear models. A 
good non-mathematical introduction is found in Chatfield (1996). The approach 
we describe based on the infinite history of the time series is discussed in greater 
detail in Hamilton (1994). Brockwell and Davis (2002) concentrate on exact linear 
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prediction methods for finite samples. A general review of exponential smoothing 
is found in Gardner (1985). 


4.3 GARCH Models for Changing Volatility 


The most important models for daily risk-factor return series are addressed in this 
section. We give definitions of ARCH (autoregressive conditionally heteroscedastic) 
and GARCH (generalized ARCH) models and discuss some of their mathematical 
properties before going on to talk about their use in practice. 


4.3.1 ARCH Processes 


Definition 4.16. Let (Z;)rez be SWN(0, 1). The process (X+)rez is an ARCH(p) 
process if it is strictly stationary and if it satisfies, for all £ € Z and some strictly 
positive-valued process (0;);<z, the equations 


Xt = Ot Zt, (4.17) 
P 
of =a9+ Y aX}, (4.18) 
i=1 
where ap > 0 and œ; >0,i = 1,..., p. 


Let F; = o({Xs : s < t}) again denote the sigma algebra representing the history 
of the process up to time ¢ so that (F;)rez is the natural filtration. Clearly, the 
construction (4.18) ensures that o; is measurable with respect to F;_1. This allows 
us to calculate that, provided E(|X;|) < œœ, 


E(X; | Fi-1) = ECO, Z: | Fr-1) = 0 E (Z; | Fi-1) = GH E(Z;) = 0, (4.19) 


so that the ARCH process has the martingale-difference property with respect to 
(Fi)rez. If the process is covariance stationary, itis simply a white noise, as discussed 
in Section 4.2.1. 


Remark 4.17. Note that the independence of Z; and F;—ı that we have assumed 
above follows from the fact that an ARCH process must be causal, i.e. the equa- 
tions (4.17) and (4.18) must have a solution of the form X; = f(Z;, Z;-1,...) 
for some f so that Z; is independent of previous values of the process. This con- 
trasts with ARMA models where the equations can have non-causal solutions (see 
Brockwell and Davis 1991, Example 3.1.2). 


If we simply assume that the process is a covariance-stationary white noise (for 
which we will give a condition in Proposition 4.18), then E(X 2 < œ and 


var(X; | Fi—1) = E(0? Z? | Fi—1) = of var(Z;) = of. 


Thus the model has the interesting property that its conditional standard deviation 
or, or volatility, is a continually changing function of the previous squared values of 
the process. If one or more of |X;—1|, . - - , |X:—p| are particularly large, then X; is 
effectively drawn from a distribution with large variance, and may itself be large; in 
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Figure 4.6. A simulated ARCH(1) process with Gaussian innovations and parameters 
a = a, = 0.5: (a) the realization of the process; (b) the realization of the volatility; and 
correlograms of (c) the raw and (d) the squared values. The process is covariance stationary 
with unit variance and a finite fourth moment (since a, < 1 /V3) and the squared values 
follow an AR(1) process. The true form of the ACF of the squared values is represented by 
the dashed line in the correlogram. 


this way the model generates volatility clusters. The name ARCH refers to this struc- 
ture: the model is autoregressive, since X; clearly depends on previous X;_;, and 
conditionally heteroscedastic, since the conditional variance changes continually. 
The distribution of the innovations (Z;),<z can in principle be any zero-mean, 
unit-variance distribution. For statistical fitting purposes we may or may not choose 
to actually specify the distribution, depending on whether we implement a max- 
imum likelihood (ML), quasi-maximum likelihood (QML) or non-parametric fit- 
ting method (see Section 4.3.4). For ML the most common choices are stan- 
dard normal innovations or scaled ¢ innovations. By the latter we mean that 
Zt ~ t1(v,0, (v — 2)/v) in the notation of Example 3.7, so that the variance of 
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the distribution is one. We keep these choices in mind when discussing further 
theoretical properties of ARCH and GARCH models. 


The ARCH(1) model. In the rest of this section we analyse some of the properties 
of the ARCH(1) model. These properties extend to the whole class of ARCH and 
GARCH models, but are most easily introduced in the simplest case. A simulated 
realization of an ARCH(1) process with Gaussian innovations and the corresponding 
realization of the volatility process are shown in Figure 4.6. 

Using X? = o7Z? and (4.18) in the case p = 1, we deduce that the squared 

ARCH(1) process satisfies 

X? = a Z? +01 Z7?X? 4. (4.20) 
A detailed mathematical analysis of the ARCH(1) model involves the study of 
equation (4.20), which is a stochastic recurrence equation (SRE). Much as for the 
AR(1) model in Example 4.11, we would like to know when this equation has 
stationary solutions expressed in terms of the infinite history of the innovations, 
i.e. solutions of the form X? = f(Z;, Ziren) 

For ARCH models we have to distinguish carefully between solutions that are 
covariance stationary and solutions that are only strictly stationary. It is possible to 
have ARCH(1) models with infinite variance, which obviously cannot be covariance 
stationary. 


Stochastic recurrence relations. The detailed theory required to analyse stochastic 
recurrence relations of the form (4.20) is outside the scope of this book, and we give 
only brief notes to indicate the ideas involved. Our treatment is based on Brandt 
(1986) and Mikosch (2003); see Notes and Comments at the end of this section for 
further references. 

Equation (4.20) is a particular example of a class of recurrence equations of the 
form 

Y, = ArY;-1 + Bi, (4.21) 

where (A;);ez and (B;);ez are sequences of iid rvs. Sufficient conditions for a 
solution are that 


E(max{0, In|B;|}) <œ and E(In|A;|) <0, (4.22) 


where Int x = max(0, In x). The unique solution is given by 
oo i—l 
Y, = B+) Bai | | Arj. (4.23) 
i=1 j=0 
where the sum converges absolutely, almost surely. 


We can develop some intuition for the conditions (4.22) and the form of the 
solution (4.23) by iterating equation (4.21) k times to obtain 


Yı = A;(A;y—-1¥;-2 + Bi-1) + Bı 


k i-1 k 
=B,+) Bi | [Ar j+ Yir | [A i- 
i=l j=0 i=0 
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The conditions (4.22) ensure that the middle term on the right-hand side converges 
absolutely and the final term disappears. In particular, note that 


k 
1 a 
FS Ze [Anil —> E(In|A;|) <0 


by the strong law of large numbers. So 


k k 
IAS exp( Zm Al) 2S o, 
i=0 i=0 


which shows the importance of the E (ln |A;|) < 0 condition. The solution (4.23) to 
the SREis a strictly stationary process (being a function of iid variables (As, Bs)s<rt), 
and the E (ln |A;|) < 0 condition turns out to be the key to the strict stationarity of 
ARCH and GARCH models. 


Stationarity of ARCH(1). The squared ARCH(1) model (4.20) is an SRE of the 
form (4.21) with A; = a Z? and B, = agZ?. Thus the conditions in (4.22) translate 
into the requirements that E(Int |ao z? |) < oo, which is automatically true for the 
ARCH(1) process as we have defined it, and E (In(a Z2)) < 0. This is the condition 
for a strictly stationary solution of the ARCH(1) equations and it can be shown that 
it is in fact a necessary and sufficient condition for strict stationarity (see Bougerol 
and Picard 1992). From (4.23), the solution of equation (4.20) takes the form 


lee) i 
X? =a x a I] tie (4.24) 
i=0 j=0 


If the (Z;) are standard normal innovations, then the condition for a strictly sta- 
tionary solution is approximately œ < 3.562; perhaps somewhat surprisingly, if 
the (Z+) are scaled ¢ innovations with four degrees of freedom and variance 1, the 
condition is a; < 5.437. Strict stationarity depends on the distribution of the inno- 
vations but covariance stationarity does not; the necessary and sufficient condition 
for covariance stationarity is always a, < 1, as we now prove. 


Proposition 4.18. The ARCH(1) process is a covariance-stationary white noise 
process if and only if a, < 1. The variance of the covariance-stationary process is 
given by ag/(1 — a). 


Proof. Assuming covariance stationarity it follows from (4.20) and E (Z?) = | that 
of = E(X?) =ag+ a E(X?) =ao+ ajo. 


Clearly, o2 = &o/(l1 — a1) and we must have a, < 1. 
Conversely, if a; < 1, then, by Jensen’s inequality, 


E(n(@1Z?)) < In(E(1Z;)) = In) < 0, 
and we can use (4.24) to calculate that 


a0 


CO 
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Figure 4.7. (a), (b) Strictly stationary ARCH(1) models with Gaussian innovations which 
are not covariance stationary (a; = 1.2 and a; = 3, respectively). (c) A non-stationary 
(explosive) process generated by the ARCH(1) equations with w; = 4. Note that (b) and (c) 
use a special double-logarithmic y-axis where all values less than one in modulus are plotted 
at zero. 


The process (X;);ez is a martingale difference with a finite, non-time-dependent 
second moment. Hence it is a white noise process. 


See Figure 4.7 for examples of non-covariance-stationary ARCH(1) models 
as well as an example of a non-stationary (explosive) process generated by the 
ARCH(1) equations. The process in Figure 4.6 is covariance stationary. 


On the stationary distribution of X+. It is clear from (4.24) that the distribution of 
the (X+) in an ARCH(1) model bears a complicated relationship to the distribution of 
the innovations (Z+). Even if the innovations are Gaussian, the stationary distribution 
of the time series is not Gaussian, but rather a leptokurtic distribution with more 
slowly decaying tails. 

Moreover, from (4.17) we see that the distribution of X; is a normal mixture 
distribution of the kind discussed in Section 3.2. Its distribution depends on the 
distribution of o+, which has no simple form. 


Proposition 4.19. Form > 1, the strictly stationary ARCH(1) process has finite 
moments of order 2m if and only if E(Z?”) < œ anda < OR © pau) ie 
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Proof. We rewrite (4.24) in the form X? = ZY Yni for positive rvs Y;; = 


ooo [1-1 Z?_;, i > 1, and Y; o = ao. For m > 1 the following inequalities hold 


(the latter being Minkowski’s inequality): 
EO) + EY) < EY, + Ye") < CE) + Ey”. 
Since 


E(X") = Bz") ( ( D o J 
i=0 


it follows that 


EZ") $ EO”) < EX?) < egm (D eoz) ; 
i=0 i=0 


Since E(Y/",) = ate” (E (Zeoy , it may be deduced that all three quantities are 
finite if and only if E(Z?”) < oo anda E (Z?™”) < 1. 


For example, for a finite fourth moment (m = 2) we require a, < 1/3 in the 
case of Gaussian innovations and a < 1/6 in the case of t innovations with 
six degrees of freedom; for ¢ innovations with four degrees of freedom the fourth 
moment is undefined. 

Assuming the existence of a finite fourth moment, it is easy to calculate its value, 
and also that of the kurtosis of the process. We square both sides of (4.20), take 
expectations of both sides and then solve for E(X 5) to obtain 


ag E(Z})(1 — a?) 


4) _ 
AS (1 — a1)(1 — &?E(Z$)) 


The kurtosis of the stationary distribution «x can then calculated to be 


O EX) _ «zd -a?) 
Kx = D9 an DS? 
E(X;7) ad afkz) 


where kz = E (Z4) denotes the kurtosis of the innovations. Clearly, when kz > 1, 
the kurtosis of the stationary distribution is inflated in comparison with that of the 
innovation distribution; for Gaussian or t innovations kx > 3, so the stationary 
distribution is leptokurtic. The kurtosis of the process in Figure 4.6 is 9. 


Parallels with the AR(\) process. We now turn our attention to the serial depend- 
ence structure of the squared series in the case of covariance stationarity (a; < 1). 
We write the squared process as 


X? = 6/2? = 07 + 0} (Z? - 1). (4.25) 


Setting V; = Giz — 1) we note that (V;),;<z forms a martingale difference series, 
since E|V;| < œ and E(V; | Fi—1) = of E(Z? — 1) = 0. Now we rewrite (4.25) as 
x? = œo +a,X a + V;, and observe that this closely resembles an AR (1) process 
for X s except that V, is not necessarily a white noise process. If we restrict our 
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attention to processes where E(X P8) is finite, then V; has a finite and constant sec- 
ond moment and is a white noise process. Under this assumption, X 2 is an AR(1) 
according to Definition 4.7 of the form 


ao 2 xo 
<= =| X? = V;. 
( í E. a( g ae 


It has mean ag /(1 — a1) and we can use Example 4.11 to conclude that the autocorre- 
lation function is p(h) = a!l, h € Z. Figure 4.6 shows an example of an ARCH(1) 
process with finite fourth moment whose squared values follow an AR(1) process. 


4.3.2 GARCH Processes 


Definition 4.20. Let (Z;);<z be SWN(0, 1). The process (X;)rez isa GARCH(p, q) 
process if it is strictly stationary and if it satisfies, for all £ € Z and some strictly 
positive-valued process (0;);<z, the equations 


Pp q 
Xp=OZ;,, of =a9+ > ajX7_, +) Bjo (4.26) 
i=l j=l 
where a > 0, œ; > 0,i = 1,..., p, and j > 0, j= 1,...,q. 


The GARCH processes are generalized ARCH processes in the sense that the 
squared volatility o? is allowed to depend on previous squared volatilities, as well 
as previous squared values of the process. 


The GARCH(1, 1) model. In practice, low-order GARCH models are most widely 
used and we will concentrate on the GARCH(1, 1) model. In this model periods of 
high volatility tend to be persistent, since |X;| has a chance of being large if either 
|X;—1| is large or o;~1 is large; the same effect can be achieved in ARCH models of 
high order, but lower-order GARCH models achieve this effect more parsimoniously. 
A simulated realization of a GARCH(1, 1) process with Gaussian innovations and 
its volatility are shown in Figure 4.8; in comparison with the ARCH(1) model of 
Figure 4.6 it is clear that the volatility persists longer at higher levels before decaying 
to lower levels. 


Stationarity. It follows from (4.26) that for a GARCH(1, 1) model we have 
of = œo + (a1 Z?_,+B)02,, (4.27) 


which is again an SRE of the form Y; = A; Y;,—1 + B;, as in (4.21). This time it is an 
SRE for Y, = o7 rather than X?, but its analysis follows easily from the ARCH(1) 
case. 

The condition E (ln |A;|) < 0 for a strictly stationary solution of (4.21) translates 
to the condition E (In(a@ 1 Z? + B)) < 0 for (4.27) and the general solution (4.23) 
becomes 


o? = œo + œo x ICS + B). (4.28) 


i=l j=1 
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If (67 )rez is a strictly stationary process, then so is (X;);ez, since X; = 0; Z; and 
(Z;)rez iS simply strict white noise. The solution of the GARCH(1, 1) defining 
equations is then 


X,=Z, (1 +) ][@z};+ p)). (4.29) 


i=1 j=l 
and we can use this to derive the condition for covariance stationarity. 


Proposition 4.21. TheGARCH(1, 1) process is a covariance-stationary white noise 
process if and only ifa, + < 1. The variance of the covariance-stationary process 
is given by ao/(1 — a — B). 


Proof. We use a similar argument to Proposition 4.18 and make use of (4.29). 


Fourth moments and kurtosis. Using a similar approach to Proposition 4.19 we can 
use (4.29) to derive conditions for the existence of higher moments of a covariance- 
stationary GARCH(1, 1) process. For the existence of a fourth moment, a necessary 
and sufficient condition is that E ((œ1 Ze +6 )*) < llor alternatively that 
(a) + B)? <1- (kz — Nay. 

Assuming this to be true we calculate the fourth moment and kurtosis of X;. We 
square both sides of (4.27) and take expectations to obtain 

E(o;!) = a9 + (aqkz +B’ + 2a1P)E(o;') + 2ag(a1 + BE; ). 
Solving for E (0f), recalling that E(o7) = E(X?) = ag/(1 — a; — £), and setting 
E(X}) = kz E (of) we obtain 

apez(1 — (a1 + B)”) 
(1 — a1 — BP? — a?kz — P? — 2018)’ 


E(x) = 


from which it follows that 


ez = (or + BY?) 
(1 — (a1 + b)? — (kz — Na?) 
Again it is clear that the kurtosis of X; is greater than that of Z, whenever kz > 1, 


such as for Gaussian and scaled t innovations. The kurtosis of the GARCH(1, 1) 
model in Figure 4.8 is 3.77. 


KX 


Parallels with the ARMA(1, 1) process. Using the same representation as in equa- 
tion (4.25), the covariance-stationary GARCH(1, 1) process may be written as 

X? = œo +.a1X7_1 + Bo; + Vr, 
where V; is a martingale difference, given by V; = o? (Z? — 1). Since oZ 1= 
X? 1 — Vi-1, we may write 


t— 


X? = ag + (1 + B)X7_, — BVi-1 + Vi, (4.30) 


4.3. GARCH Models for Changing Volatility 147 


+ (a) 2.0. (b) 


1.8 4 
1.6 4 


1.4 4 


1.24 
1.04 
0.8 4 
0 200 400 600 800 1000 O 200 400 600 800 1000 
1.04 (c) 1.0 4 (d) 
0.84 0.8 4 
0.64 0.64 
5 O 
x x 
0.44 0.4 4 
0.24 0.27) 
reece crac Bec tH tistics 
0 5 10 15 20 25 30 0 5 10 15 20 25 30 
Lag Lag 


Figure 4.8. A GARCH(1, 1) process with Gaussian innovations and parameters ag = 0.5, 
a, = 0.1, 6 = 0.85: (a) the realization of the process; (b) the realization of the volatility; and 
correlograms of (c) the raw and (d) the squared values. The process is covariance stationary 
with unit variance and a finite fourth moment and the squared values follow an ARMA(1, 1) 
process. The true form of the ACF of the squared values is shown by a dashed line in the 
correlogram. 


which begins to resemble an ARMA(1, 1) process for X 2 If we further assume that 
E 6.63) < œQ, then, recalling that a; + 6 < 1, we have formally that 


(x?- 5) =e to(xe =) BVi-1 + Vt 


isan ARMA(1, 1) process. Figure 4.8 shows an example of aGARCH(1, 1) process 
with finite fourth moment whose squared values follow an ARMA(1, 1) process. 


The GARCH(p, q) model. Higher-order ARCH and GARCH models have the 
same general behaviour as ARCH(1) and GARCH(1, 1), but their mathematical 
analysis becomes more tedious. The condition for a strictly stationary solution of the 
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defining SRE has been derived by Bougerol and Picard (1992), but is complicated. 
The necessary and sufficient condition that this solution is covariance stationary is 


Dine + Dja1 By <1. 
A squared GARCH(p, q) process has the structure 


max(p,q) q 
X? = ao + Ss (a; + Bi) X7_; — >54; Vi-j + Vs 
i=l j=l 


where a; = Ofori = p + 1,...,q ifq > p,or Bj =Oforj=qtl,..., pif 
p > q. This resembles the ARMA(max(p, q), q) process and is formally such a 
process provided E x} < OO. 


Integrated GARCH. The study of integrated GARCH (or IGARCH) processes 
has been motivated by the fact that, in some applications of GARCH modelling 
to daily or higher-frequency risk-factor return series, the estimated ARCH and 
GARCH coefficients (@1,...,@p», Bi,..., q) are observed to sum to a num- 
ber very close to one, and sometimes even slightly larger than one. In a model 
where )7?_, aj + a Bj; 2 1, the process has infinite variance and is thus non- 
covariance-stationary. The special case where ee aj + Di j = 1 is known 
as IGARCH and has received some attention. 

For simplicity consider the IGARCH(1, 1) model. We use (4.30) to conclude that 
the squared process must satisfy 


VX? = X? — X? =œ- (l — 1)V 1 + Vi, 


where V; is a noise sequence defined by V, = of (Zz? — 1) and o? = Œo + aX? , + 
d-a joi . This equation is reminiscent of an ARIMA (0, 1, 1) model (see (4.14)) 
for X 2 although the noise V; is not white noise, nor is it strictly speaking a martingale 
difference according to Definition 4.6. E (V; | ¥;~1) is undefined since E (67) = 
E(X?) = oo, and therefore E|V;| is undefined. 


4.3.3 Simple Extensions of the GARCH Model 


Many variants on and extensions of the basic GARCH model have been proposed. 
We mention only a few (see Notes and Comments for further reading). 


ARMA models with GARCH errors. We have seen that ARMA processes are driven 
by a white noise (€;);<7z and that a covariance-stationary GARCH process is an 
example of a white noise. In this section we put the ARMA and GARCH models 
together by setting the ARMA error £; equal to o; Z;, where o; follows a GARCH 
volatility specification in terms of historical values of ¢;. This gives us a flexible 
family of ARMA models with GARCH errors that combines the features of both 
model classes. 


Definition 4.22. Let (Z;);<z be SWN(0, 1). The process (X+)rez is said to be an 
ARMA(p1, q1) process with GARCH(p2, q2) errors if it is covariance stationary 
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and satisfies difference equations of the form 


Xp = My + 0;Z;, 


Pi qı 
Mr = u+} AXi m+) OX- a-i), 
i=l j=l 


p2 q2 
of = œo + > ai (Xii — bri)? + 5 Bio; 
i=l j=l 
where a > 0, œ; > 0,i = 1,..., p2, Bj > 0, j = 1,...,q2, and D2 ai + 

Bj <1. 

To be consistent with the previous definition of an ARMA process we build the 
covariance-stationarity condition for the GARCH errors into the definition. For the 
ARMA process to be a causal and invertible linear process, as before, the polynomials 
$z) =l- pz- — $pız”! and Õlz)=1+0z+ -+ 64,24! should have no 
common roots and no roots inside the unit circle. 

Let (Fi)rez denote the natural filtration of (X+)rez and assume that the ARMA 
model is invertible. The invertibility of the ARMA process ensures that up is 
F;—1-measurable as in (4.13). Moreover, since o; depends on the infinite history 
(Xs — Ms)s<r—1, the ARMA invertibility also ensures that o; is ¥;_,-measurable. 
Simple calculations show that u; = E(X, | F;~1) and o = var(X; | Fı—1), so 
that ju; and o? are the conditional mean and variance of the new process. 


GARCH with leverage. One of the main criticisms of the standard ARCH and 
GARCH models is the rigidly symmetric way in which the volatility reacts to recent 
returns, regardless of their sign. Economic theory suggests that market information 
should have an asymmetric effect on volatility, whereby bad news leading to a fall 
in the equity value of a company tends to increase the volatility. This phenomenon 
has been called a leverage effect, because a fall in equity value causes an increase in 
the debt-to-equity ratio or so-called leverage of a company and should consequently 
make the stock more volatile. At a less theoretical level it seems reasonable that 
falling stock values might lead to a higher level of investor nervousness than rises 
in value of the same magnitude. 

One method of adding a leverage effect toa GARCH (1, 1) model is by introducing 
an additional parameter into the volatility equation (4.26) to get 


o? = ag +011 (Xr-1 + 8|X_1)” + Bo? 4. (4.31) 


We assume that 6 € [—1, 1] and a; > 0 as in the GARCH(1, 1) model. Observe 
that (4.31) may be written as 


„2 Jæ +ay(1+)?X? | +80}; X120, 
f ay +a (l — 8)? X2 | +802}, Xi-1 <0, 


and hence that 


do7 _ Jad +80, X11 20, 
ax? , (ad - 8o} X <0. 
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The response of volatility to the magnitude of the most recent return depends on 
the sign of that return, and we generally expect ô < 0, so bad news has the greater 
effect. 


Threshold GARCH. Observe that (4.31) may easily be rewritten in the form 
of = ao + &1 X? + Si lx, <0)X7_-1 + BO, (4.32) 


where @ = œı (1 + ô)? and 5 = —4ôa]. Equation (4.32) gives the most common 
version of a threshold GARCH (or TGARCH) model. In effect, a threshold has been 
set at level zero, and at time t the dynamics depend on whether the previous value of 
the process X;_, (or innovation Z;_;) was below or above this threshold. However, 
it is also possible to set non-zero thresholds in TGARCH models, so this represents 
a more general class of model than GARCH with leverage. 

In a less common version of threshold GARCH the coefficients of the GARCH 
effects depend on the signs of previous values of the process; this gives a first-order 
process of the form 


of = a9 + 41X? + Bo? , + d1pyx,_, <0j07 1. (4.33) 
Remark 4.23. Note, also, that a further way to introduce asymmetry into a GARCH 
model is to explicitly use an asymmetric innovation distribution (albeit normalized 
to have mean zero and variance one). Candidate distributions could come from the 
generalized hyperbolic family of Section 3.2.3. 


4.3.4 Fitting GARCH Models to Data 


Building the likelihood. In practice, the most widely used approach to fitting 
GARCH models to data is maximum likelihood. We consider in turn the fitting of the 
ARCH(1) and GARCH(1, 1) models, from which the fitting of general ARCH(p) 
and GARCH(p, q) models easily follows. 

For the ARCH(1) and GARCH(1, 1) models suppose we have a total of n + 1 
data values Xo, X1,..., Xn. It is useful to recall that we can write the joint density 
of the corresponding rvs as 


n 
XOX X0 «+s Xn) = Fxo(%0) | | xxii.. X0 Oe | Xr-1,---5m0). (4.34) 


t=1 
For the pure ARCH(1) process, which is first-order Markovian, the conditional 
densities f¥,|x,_1,...,Xq In (4.34) depend on the past only through the value of o; or, 
equivalently, X;—1. The conditional density is easily calculated to be 


Xt 


), (4.35) 


FR NX pp y0,Xq %r | X15 +++ XO) = FX, Or | 1-1) = =s(2 

Or Or 
where o, = (œo + aix?) 2 and g(z) denotes the density of the innovations 
(Z;)+ez. We recall that this must have mean zero and variance one and typical 
choices would be the standard normal density or the density of a ¢ distribution 
scaled to have unit variance. 
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However, the marginal density fx, in (4.34) is not known in a tractable closed 
form for ARCH and GARCH models and this poses a problem for basing a likelihood 
on (4.34). The solution employed in practice is to construct the conditional likelihood 
given Xo, which is calculated from 


t=1 


For the ARCH(1) model this follows from (4.35) and is 


tot yX 
L(ao, a1; X) = JRK A aA, Xo) =] [>s =h 
t 


O, 
t=1 


with of = (ao + a1 X?) 2. For an ARCH( p) model we would use analogous 
arguments to write down a likelihood conditional on the first p values. 

In the GARCH(1, 1) model o; is recursively defined in terms of o;_1, and here, 
instead of using (4.36), we construct the joint density of X1, . . ., Xn conditional on 
realized values of both Xo and og, which is 


t=1 
The conditional densities fy,|x,_,,...,X9,09 depend on the past only through the value 
of o;, which is given recursively from 09, Xo, ..., X;—1 using o = &o +X? + 
por. 1- This gives us the conditional likelihood 


n 
1 X 
Lao, a1, B: v= že(#). or = y ao + 01X71 + poka 


(oF O; 
izi t t 


The problem remains that the value of og is not actually observed, and this is usually 
solved by choosing a starting value, such as the sample variance of X1,..., Xn, Or 
even simply zero. 

For a GARCH(p, g) model we would assume that we had n + p data values 
labelled X_p+1,..., Xo, X1,..., Xn. We would evaluate the likelihood conditional 
on the (observed) values of X_p+1,..., Xo as well as the (unobserved) values of 
O_g+1,---, 00, for which starting values would be used as above. For example, if 
p = 1 and q = 3, we require starting values for og, o_; and o_2. 

A similar approach can be used to develop a likelihood for an ARMA model with 
GARCH errors. In this case we would end up with a conditional likelihood of the 


form 
n 


1 (X:- 
LO: X) =|] (2, 
Ot Ot 
t=1 

where o; follows a GARCH specification and jz; follows an ARMA specification 
as in Definition 4.22, and all unknown parameters (possibly including unknown 
parameters of the innovation distribution) have been collected in the vector 0. We 
could of course also consider models with leverage or threshold effects. 
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Deriving parameter estimates. Consider, then, a log-likelihood of the form 
n 
In L(6; X) = X` 1,6), (4.37) 
t=1 


where /, denotes the log-likelihood contribution arising from the tth observation. 
The maximum likelihood estimate 0 maximizes the (conditional) log-likelihood 
in (4.37) and, being in general a local maximum, solves the likelihood equations 


n 


a yy WO) 
ag MLO: X) = -y = 0. (4.38) 


t=1 
where the left-hand side is also known as the score vector of the conditional like- 
lihood. The equations (4.38) are usually solved numerically using so-called mod- 
ified Newton—Raphson procedures. A particular method which is widely used for 
GARCH models is the BHHH method of Berndt, Hall, Hall and Hausmann. 

In describing the behaviour of parameter estimates in the following paragraphs, 
we distinguish two situations. In the first situation we assume that the model that has 
been fitted has been correctly specified, so that the data are truly generated by a time 
series model with both the assumed dynamic form and innovation distribution. We 
describe the asymptotic behaviour of the maximum likelihood estimates (MLEs) 
under this idealization. 

In the second situation we assume that the correct dynamic form is fitted but that 
the innovations are erroneously assumed to be Gaussian. Under this misspecification 
the model fitting procedure is known as quasi-maximum likelihood (QML) and the 
estimates obtained are QMLEs. Essentially, the Gaussian likelihood is treated as an 
objective function to be maximized rather than a proper likelihood; our intuition 
suggests that this may still give reasonable parameter estimates and this turns out to 
be the case under appropriate assumptions about the true innovation distribution. 


Properties of MLEs. It helps to recall at this point the asymptotic distribution 
theory for MLEs in the classical iid case, which is summarized in Section A.3. The 
asymptotic results we give for GARCH models have a similar form to the results 
in the iid case, but it is important to realize that this is not simply an application 
of these results. The asymptotics have been separately and laboriously derived in a 
series of papers for which starting references are given in Notes and Comments. We 
will give results for pure GARCH models without ARMA components or additional 
leverage structure, which have been studied rigorously, but the form of the results 
will apply more generally. 

For a pure GARCH(p, q) model with Gaussian innovations it can be shown that 
(assuming the model has been correctly specified) 


Vib, — 0) Š Np4g41(0, 1(0)~!), 


where 


al, (8) HO) a (e) (4.39) 


I =E = 
4 ( 30 30 0000’ 
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is the Fisher information matrix arising from any single observation. Thus we have 
consistent and asymptotically normal estimates of the GARCH parameters. In prac- 
tice, the expected information matrix J (0) is approximated by an observed informa- 
tion matrix, and here we could take the observed information matrix coming from 
either of the equivalent forms for the expected information matrix in (4.39). That is, 
we could use 


n 


n 2 
1(0) = : ye Ga za) or J(0) = a 5 a) (4.40) 
t=1 


0008" n 0000 ` 


t=1 
where the first matrix is said to have outer-product form and the second is said to have 
Hessian form. These matrices are estimated by evaluating them at the MLEs to get 
I (6) or J (6). In practice, this is done by numerical first and second differencing of 
the log-likelihood at the MLE and the necessary matrices are obtained as byproducts 
of the BHHH procedure for deriving the parameter estimates. 

If the model is correctly specified, the estimates I (6) and J (6) should be broadly 
similar, being estimators based on two different expressions for the same Fisher 
information matrix. In practice, we could also estimate 7 (0) by J (6)7 (6) se (6), and 
this anticipates the so-called sandwich estimator that is used in the QML procedure. 


Properties of QMLEs. In this approach we assume that the true data-generating 
mechanism is a GARCH(p,q) model with non-Gaussian innovations, but we 
attempt to estimate the parameters of the process by maximizing the likelihood 
for a GARCH(p, q) model with Gaussian innovations. We still obtain consistent 
estimators of the model parameters and, if the true innovation distribution has a 
finite fourth moment, we again get asymptotic normality; however, the form of the 
asymptotic covariance matrix changes. 
We now distinguish between matrices 7 (0) and J (0), given by 


2 
10) = e( 7 rh. 1@) =-2(5 i). 


0000’ 0000" 


where the expectation is now taken with respect to the true model (not the mis- 
specified Gaussian model). The matrices 7 (0) and J (0) differ in general (unless the 
Gaussian model is correct). It may be shown that 


Vin — 8) > N prq (0, JO) 110)I(0)-'), (4.41) 


and the asymptotic covariance matrix is said to be of sandwich form; it can be 
estimated by J(6)~!7(8)J(6)~!, where 1(@) and J(@) are defined in (4.40). If 
the model-checking procedures described below suggest that the dynamics have 
been adequately described by the GARCH model, but the Gaussian assumption 
seems doubtful, then standard errors for parameter estimates should be based on 
this covariance matrix estimate. 


Model checking. As with ARMA models it is usual to check fitted GARCH models 
using residuals. We consider a general ARMA-—GARCH model of the form X; — ju; = 
& = 0;Z;, with u; and op as in Definition 4.22. In this model we distinguish 
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between unstandardized and standardized residuals. The former are the residuals 
&1,..., € from the ARMA part of the model; they are calculated using the approach 
in (4.15), and under the hypothesized model they should behave like a realization of 
a pure GARCH process. The latter are reconstructed realizations of the SWN that 
is assumed to drive the GARCH part of the model, and they are calculated from the 
former by 


p2 q2 

4 eA a2 A a a2 Ar AD: 

Z, = &1/61, o; = Qo + ) QiE_i + J Pjõ j (4.42) 
i=l j=l 


To use (4.42) we need some initial values, and one solution is to set required starting 
values of ê, equal to zero and required starting values of the volatility ô, equal to 
either the sample variance or zero. Because the first few values will be influenced 
by these starting values, as well as the starting values required to calculate the 
unstandardized residuals, they may be ignored in later analyses. 

The standardized residuals should behave like an SWN and this can be investigated 
by constructing correlograms of raw and absolute values and applying portmanteau 
tests of strict white noise, as described in Section 4.2.3. 

Assuming that the SWN hypothesis is not rejected, so that the dynamics have 
been satisfactorily captured, the validity of the distribution used in the ML fitting 
can also be investigated using QQplots and goodness-of-fit tests for the normal or 
scaled t distributions. If the Gaussian likelihood does a reasonable job of estimating 
dynamics, but the residuals do not behave like iid standard normal observations, then 
the QML fitting philosophy can be adopted and standard errors can be estimated 
using the sandwich estimator implied by (4.41) above. 

This opens up the possibility of two-stage analyses where first the dynamics are 
estimated by QML methods and then the innovation distribution is modelled using 
the residuals from the dynamic model as data. The first stage is sometimes called 
pre-whitening of the data. In the second stage we might consider using heavier-tailed 
models than the Gaussian that also allow some asymmetry in the innovations. 

A disadvantage of the two-stage approach is that the error from the time series 
modelling propagates through to the distributional fitting in the second stage and the 
overall error is hard to quantify, but the procedure does lead to more transparency in 
model building and allows us to separate the tasks of volatility modelling and mod- 
elling the shocks that drive the process. In higher-dimensional risk-factor modelling 
it may be a useful pragmatic approach. 


Example 4.24 (GARCH model for Microsoft log-returns). We consider the 
Microsoft daily log-returns for the period 1997-2000 (1009 values), as shown in 
Figure 4.9. Although the raw returns show no evidence of serial correlation (see Fig- 
ure 4.10), their absolute values do show serial correlation and they fail a Ljung—Box 
test (based on the first 10 estimated correlations) at the 5% level. 

For these data, models with Student ¢ innovations are clearly preferred to models 
with Gaussian innovations, so we adopt an ML approach to fitting models with ¢ inno- 
vations. We compare the standard GARCH(1, 1) model (with a constant mean term) 
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Figure 4.9. Microsoft log-returns 1997-2000; data and estimate of 
volatility from a GARCH(1, 1) model with a leverage term. 
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Figure 4.10. Microsoft log-returns 1997—2000; correlograms of data ((a) raw and (b) abso- 
lute values) and residuals ((c) raw and (d) absolute values) from a GARCH(1, 1) model. 


with models that incorporate ARMA structure (AR(1), MA(1) and ARMA (1, 1)) 
for the conditional mean; the ARMA structure seems to offer little improvement in 
the model and the basic GARCH (1, 1) model is favoured in an Akaike comparison. 
However, a model with a leverage term as in (4.31) does seem to offer an improve- 
ment. Both the raw and absolute standardized residuals obtained from this model 
show no visual evidence of serial correlation (see again Figure 4.10) and they do not 
fail Ljung—Box tests. The estimated degrees-of-freedom parameter of the (scaled) 
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Figure 4.11. Microsoft log-returns 1997-2000; QQplot of residuals from a GARCH(1, 1) 
model with leverage against a Student ż distribution with 6.30 degrees of freedom. 


Table 4.2. Analysis of Microsoft log-returns for the period 1997-2000; ML estimates of 
parameters and standard errors for a GARCH(1, 1) model with a leverage term under the 
assumption of t innovations. 


Parameter Estimate Standard error Ratio 
u 9.35 x 1074 7.21x 1074 1.30 
ag 7.799x 1075 3.07x 1075 2.54 
ay 0.108 0.0369 2.91 
b 0.778 0.0673 11.57 
5 —0.178 0.123 —1.45 


t distribution is 6.30 (the standard error is 1.07) and a QQplot of the residuals against 
this reference distribution reveals a satisfactory correspondence (see Figure 4.11). 
The estimates of the remaining parameters (with standard errors) in this model are 
given in Table 4.2. 


Notes and Comments 


The ARCH process was originally proposed by Engle (1982), and the GARCH 
process by Bollerslev (1986), who gave the condition for covariance stationarity. 
Overview texts on GARCH models include the book by Gourieroux (1997) and 
a number of useful review articles including Bollerslev, Chou and Kroner (1992), 
Bollerslev, Engle and Nelson (1994) and Shephard (1996). There are also substantial 
sections on GARCH models in the books by Alexander (2001), Tsay (2002) and 
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Zivot and Wang (2003). The IGARCH model was first discussed by Engle and 
Bollerslev (1986). 

The condition for strict stationarity of GARCH models was derived by Nelson 
(1990) in the case of the GARCH(1, 1) model and Bougerol and Picard (1992) for 
GARCH(p, q). The necessary theory involves the study of stochastic recurrence 
relations and goes back to Kesten (1973); Brandt (1986) is also a useful reference. 
Readable accounts of this theory may be found in Embrechts, Kluppelberg and 
Mikosch (1997), Mikosch and Starica (2000) and Mikosch (2003). 

For more on the derivation of conditional likelihood functions for ARCH and 
GARCH models see Hamilton (1994) and Tsay (2002). The BHHH algorithm 
(Berndt et al. 1974) is the most commonly used approach to numerically maximiz- 
ing the likelihood. For an informative general discussion of numerical optimization 
procedures in the context of maximum likelihood see Hamilton (1994, pp. 133-142). 
Standard general references on the QML approach are White (1981) and Gourieroux, 
Montfort and Trognon (1984). 

The essential asymptotic properties of MLEs and QMLEs in GARCH models 
are described in many publications, but the detailed mathematical proof has often 
lagged behind the assertions. Early papers appealed to regularity conditions for con- 
ditionally specified models such as those of Crowder (1976), which are essentially 
unverifiable. Lee and Hansen (1994) and Lumsdaine (1996) proved consistency and 
asymptotic normality of QMLEs in the GARCH(1, 1) model. More recently, Berkes, 
Horvath and Kokoszka (2003) have extended this to the GARCH(p, q) model under 
minimal assumptions, and Mikosch and Straumann (2005) and Straumann (2003) 
have given similar results for a wide variety of first-order models. 

From a more practical point-of-view, it is not easy to estimate GARCH model 
parameters to a high degree of accuracy because of the flatness of the typical likeli- 
hoods and the non-negligible influence of starting values in finite samples. Readers 
who write their own code may wish to compare their estimates with benchmark 
studies by McCullough and Renfro (1999) and Brooks, Burke and Persand (2001). 

Alternative innovation distributions to the Gaussian and scaled ¢ distributions that 
have been considered include the generalized error distribution (GED) in Nelson 
(1991) and the normal inverse Gaussian (NIG) in Venter and de Jongh (2001); the 
latter authors present extensive evidence that the NIG is a good choice of innovation 
distribution for practical work and that GARCH inference based on the NIG is 
relatively robust to misspecification of the distribution. 

A great many extensions to the GARCH class have been proposed and thor- 
ough surveys may be found in Bollerslev, Engle and Nelson (1994) and Shephard 
(1996). Leverage effects in the GARCH model and the more general PGARCH 
(power GARCH) model are examined in Ding, Granger and Engle (1993). Various 
threshold GARCH models have been suggested; the model (4.32) is of the type sug- 
gested by Glosten, Jagannathan and Runkle (1993), while (4.33) is the switching- 
volatility GARCH (SV-GARCH) model of Fornari and Mele (1997). There have 
been proposals for non-parametric ARCH and GARCH modelling, including the 
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multiplicative ARCH(p)-model of Yang, Hardle and Nielsen (1999) and the non- 
parametric GARCH procedure of Buhlmann and McNeil (2002). 


4.4 Volatility Models and Risk Estimation 


In this section we elaborate on some issues raised in the discussion of standard 
methods for market risks in Section 2.3. At that point the discussion of dynamic 
risk estimation procedures was kept relatively vague, but now, armed with more 
knowledge of time series models in general and GARCH in particular, we can give 
some more detail. The main issues are the estimation of conditional risk measures 
like VaR and expected shortfall and the backtesting of such estimates. Estimating 
VaR for a future time period requires us to be able to forecast volatility and we start 
with this topic. 


4.4.1 Volatility Forecasting 


As in our earlier discussion of time series prediction in Section 4.2.5, we describe a 
model-based strategy using a GARCH-type model, before presenting the more ad 
hoc technique of exponentially weighted moving-average (EWMA) prediction. 


GARCH-based volatility prediction. Suppose that the return data X;—n+1,..., Xz 
follow a particular model in the GARCH family. We want to forecast future volatility, 
i.e. to predict the value of 0;4; for h > 1. This is closely related to the problem of 
predicting X 3 “p and uses an analogous method to that used for prediction in ARMA 
models in Section 4.2.5. We again assume that we have access to the infinite history 
of the process up to time ¢, represented by F; = o({Xs : s < t}), and then adapt 
our prediction formula to take account of the finiteness of the sample. 

Assume that the GARCH model has been fitted and its parameters estimated; we 
will suppress estimator notation for the parameters in the remainder of the section. 
We make calculations for simple models, from which the general procedure for more 
complex models should be clear. 


Example 4.25 (prediction in the GARCH(1, 1) model). Suppose that we use 
a pure GARCH(1, 1) model conforming to Definition 4.20. Assume the model is 
covariance stationary so that E (X23) =E (o2) < oo. Since (X;)rez is a martingale 
difference, optimal predictions of X;+p are zero. A natural prediction of X 2 1 based 


on F; is its conditional mean of; given by 


E(Xi41 | Fi) = 0741 = a0 + aX; + Boy, 


and if E(X 7) < oo, then this is the optimal squared error prediction. Note that the 
prediction of the random variable X A 1 based on the information F; is the value of 
ofa , which is known at time t, being a function of the infinite history of the process. 
(The process (0;);¢z is said to be previsible.) 

In practice we have to make an approximation based on this formula because the 
infinite series of past values that would allow us to calculate øŽ is not available to us. 


A natural approach in applications is to approximate o by an estimate of squared 
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volatility 6? calculated from the residual equations (4.42). Our approximate forecast 
of X z 1 also functions as an estimate of the squared volatility at time ¢ + 1 and is 
given by 

ô? = E(X? | Fi) = ao +01 X? + pô’. (4.43) 


Thus equation (4.43) can be thought of as a recursive scheme for estimating volatility 
one step ahead. 
When we look h > | steps ahead given the information at time t, both X k “n and 


Oe , are rvs. Their predictions coincide and are 


E(X? n | F) = Eloy | Fe) 
= ao +01 E(X? n1 | FÐ + BE (OF n1 | F) 
= æo + (%1 + B)E(X7,,_1 | F), 


so that a general formula is 


h—1 
E(X?44 | Fi) = a0 X (ai + BY + (or + BY" X? + Bo?), 

i=0 
and we obtain a practical formula by substituting an estimate of squared volatil- 
ity 6? as before. As h —> œo we observe that EO}, | Fi) > ao/( — a — fi), 
almost surely, so that the prediction of squared volatility converges to the uncon- 
ditional variance of the process. A concrete example of volatility prediction in a 
GARCH(1, 1) model is given in Figure 4.12 for the Microsoft data analysed in 
Example 4.24. 


We now consider a second example, which combines what we know about pre- 
diction in ARMA and GARCH models. 


Example 4.26 (prediction in an ARMA(1, 1)-GARCH(1, 1) model). Consider 
a process of the form X; — Ut = & = 0;Z;, where u; and op are ¥;_1-measurable 
rvs describing, respectively, an ARMA(1, 1) model anda GARCH(1, 1) model as in 
Definition 4.22. Prediction formulas for this model follow easily from Examples 4.15 
and 4.25. We calculate that 


E(Xi4n | Fr) = u +G” (X — u) +o" |e, (4.44) 
h—1 

var(Xi+n | Fr) = œo D(a + A) + (or + BY" ae? + Bof), (4.45) 
i=0 


and these are approximated by substituting inferred values for €; and o; obtained 
from the residual equations (4.42). Equation (4.44) yields predictions of Qr+h or 


X+4, and equation (4.45) yields predictions of é i, or cree 


Exponential smoothing for volatility. Suppose we believe our return data follow 


some kind of underlying time series model in which a volatility (conditional standard 
deviation) is defined, but that we do not wish to specify the exact parametric model. 
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Figure 4.12. Estimate of volatility for the final days of the year 2000 and predictions of 


volatility for the first 10 days of 2001 based on a GARCH(1, 1) model (without leverage) 
fitted to the Microsoft return data in Example 4.24. 


We can apply exponential smoothing as in (4.16) to the squared observations to get 
a procedure that follows the updating scheme 


P,X? 1 =aX? + (1 — a)P,-1 X°, (4.46) 


for some value of œ. 

Since the expectations of X A 1 and of coincide, we could alternatively 
regard (4.46) as an exponential smoothing scheme for the unobserved squared 
volatility. We could define a recursive scheme for one-step-ahead volatility fore- 
casting by 

6.) = aX? + (1—a)é?. (4.47) 
This is the essential idea of the EWMA approach to volatility forecasting. 

If we compare (4.47) with the one-step-ahead volatility estimation scheme defined 
by aGARCH(1, 1) model in (4.43), it is tempting to say that EWMA corresponds to 
estimating volatility using a conditional-expectation-based technique in an IGARCH 
model where the parameter ap equals zero, although this analogy should be used 
with care. GARCH and IGARCH models with ag = 0 are not well defined and the 
solution of the stochastic recurrence relation in (4.29) vanishes. Moreover, IGARCH 
is not covariance stationary. It is better to regard EWMA as a sensible model-free 
approach to volatility forecasting based on the classical technique of exponential 
smoothing. 


4.4.2 Conditional Risk Measurement 


We now return to the conditional risk-measurement problem discussed in Chapter 2 
and consider the situation where we wish to measure risk for an investment in a 
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single stock, index or currency. On day f¢ the value of the position is V; and the 
log-return for the next day is X;+1, so the (linearized) loss over the next day is 


Lia, = in (X41) = —V;X141. We require risk-measure estimates, such as VaR 
and ES, for the conditional distribution Fy, ,|¢,, where F; = o ({Xs : s < t}). We 
set V; = 1 forallt and write L; = —X;, so (L+) is the process of negative log-returns. 


Remark 4.27. Although we consider a simple univariate situation, the methodology 
of this section can be applied to portfolio losses in the context of historical simulation. 
Suppose we have constructed historical-simulation data (L) using the approach 
described in Section 2.3.2. Writing Li+1 = lin (Xr+1) for our loss over the next day, 
where lys] is the loss operator and X;+ 1 the vector of risk-factor changes, we wish 
to calculate risk-measure estimates for the conditional distribution F,, \g,, where 
Ge = o({Ls : s < t}). We simply apply the methodology of this section to the time 
series (L;); this strategy was used in Section 2.3.6. 


To calculate conditional risk measures we make the following assumption. 


Assumption 4.28. The process of losses (L+)rez, is adapted to the filtration (Fi)rez 
and follows a stationary model of the form L; = u; + 0;Z;, where u; and o, are 
F,—ı -measurable and the (Z;) are SWN(0, 1) innovations. 


A concrete example of a model satisfying Assumption 4.28 would be an (invert- 
ible) ARMA process with GARCH errors of the kind analysed in this chapter. Under 
the assumption, if we write G for the df of (Z;), we can easily calculate that 


Fri |F (D = Pegi + ori Ze SE | Fi) = GCE — bi41)/org1). (4.48) 


Thus calculation of risk measures for the conditional one-period loss distribution 
amounts to calculating risk measures for the innovation distribution G. Using the 
approach of Examples 2.14 and 2.18 we easily obtain 


VaR), = Ut+1 + 01419a(Z), (4.49) 
ES} = ur+1 + 0741 ESq(Z), (4.50) 


where Z is a generic rv with df G. In general, to estimate the risk measures (4.49) 
and (4.50), we require estimates of j1;1 and o;+1, the conditional mean and volatility 
of the loss process. We also require the quantile and expected shortfall of the inno- 
vation df G. In a model with Gaussian innovations the latter are gy(Z) = @—!(a) 
and ES (Z) = o(@-!(a))/A — a). In a model with non-Gaussian innovations, 
da(Z) and ES, (Z) depend on any further parameters of the innovation distribution. 
For example, we might assume (scaled) ¢ innovations; in this case the quantile and 
expected shortfall of a standard univariate t distribution (the latter given in (2.27)) 
would have to be scaled by the factor ./(v — 2)/v to take account of the fact that 
the innovation distribution is scaled to have variance one. 

Concrete estimation strategies we might adopt, in order of decreasing sophistica- 
tion, include the following. 
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(1) Fit an ARMA-GARCH model with an appropriate innovation distribution 
to the data Ly-n+1,..., Li by ML and use the prediction methodology dis- 
cussed in Section 4.4.1 to estimate o;4) and j4;+1. Any further parameters of 
the innovation distribution would be estimated simultaneously in the model 
fitting. 

(2) Fit an ARMA-—GARCH model by QML and use prediction methodology as 
in strategy (1) to estimate o;1 and j4;+ 1. In a separate second step we use the 
model residuals to find an appropriate innovation distribution and estimate its 
parameters. 


(3) Use EWMA to estimate 0;+4; and set u41 to zero (as it is less important). 
In conjunction with an assumption of Gaussian innovations this is essentially 
the RiskMetrics method. Instead of making the Gaussian assumption, we 
could standardize each of the losses L;-n+1,..., Ly by dividing by volatility 
estimates 6;-n41,..., 6; calculated using EWMA. This would yield a set of 
residuals, from which the innovation distribution could be estimated as in 
strategy (2). 


4.4.3 Backtesting 


Backtesting VaR. Using the notation of the previous section, we first observe that 
if we define indicator variables /;+1 = /(,,,,>var‘}, indicating violations of the 
quantiles of the conditional loss distribution, then the process (/;);ez is a pro- 
cess of iid Bernoulli variables with success (i.e. violation) probability 1 — a. This 
property is certainly true under Assumption 4.28, since it follows from (4.49) that 
Thai = Thy >VaR4} = 1{Z,.;>qa(Z)} and the innovations (Z;) are themselves iid. 
However, it is also more generally true, as the following lemma shows. 


Lemma 4.29. Let (Y;);<z be a sequence of Bernoulli indicator variables adapted 
to a filtration (F;);cz, and satisfying E(Y; | Fı—1) = p > 0 for allt. Then (Y,) is a 
process of tid Bernoulli variables. 


Proof. The process (Y; — p)rez has the martingale-difference property (see Defini- 
tion 4.6). Moreover, var(Y; — p) = E(E((% — p} | Fı—z1)) = pC — p) for all t. 
Therefore (Y, — p) and hence (Y,) are white noise processes of uncorrelated vari- 
ables. It is easily shown that identically distributed uncorrelated Bernoulli variables 
are iid. 


In practice, we make one-step-ahead conditional VaR estimates VaR, and con- 
sider the violation indicators 


Îy :=1I (4.51) 


(L1 > VaR} 
If we are successful in estimating conditional quantiles, we would expect that the 
empirical violation indicators would behave like realizations of iid Bernoulli trials 
with success probability (1 — œ). 

Checking for iid Bernoulli violations of the one-step-ahead VaR has two aspects: 
checking that the number of violations is correct on average; checking that the 
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pattern of violations is consistent with iid behaviour. Certainly, if we calculate VaR 
estimates for times t = 1, ...,m, we expect that Sj re ~ B(m, 1 — a), and this 
is easily addressed with a standard two-sided binomial test (see, for example, Casella 
and Berger 2002, pp. 493-495). Departures from the null hypothesis would suggest 
either systematic underestimation or overestimation of VaR. 

To check for independence of the Bernoulli indicators one possibility is to perform 
a runs test of the kind described by David (1947), which involves counting runs of 
successive zeros or ones in the realizations of the indicator variables and comparing 
the realized number of runs with the known sampling distribution of the number of 
runs in iid Bernoulli data (see also Notes and Comments). 

The backtesting of conditional VaR estimates for the h-period loss distribution 
is more complicated. To use the kind of ideas above we would have to base our 
backtests on non-overlapping periods. For example, if we calculated two-week VaRs, 
we could make a comparison of the VaR estimate and the realized loss every two 
weeks, which would clearly lead to a relatively small amount of violation data with 
which to monitor the performance of the model. If we used overlapping periods, for 
example by recording the violation indicator value every day for the loss incurred 
over the previous two weeks, we would create a series of dependent Bernoulli trials 
for which formal inference would be difficult. 


Backtesting expected shortfall. We begin by observing that if ES$, is the expected 
shortfall of the (continuous) conditional loss distribution Fz,,,|¢, and we define 
Sı+1 = (Lr41 — ES} )I;+1, then for an arbitrary loss process (L;);¢z the process 
(S;)+ez forms a martingale difference series satisfying E(S;+1 | Ft) = 0. Moreover, 
under Assumption 4.28 and using (4.49) and (4.50), we have 


Sept = 0141 (Zr — ES (ZD Z1 >q (2) 


which takes the form of a volatility times a zero-mean iid sequence of innovation 
variables ((Z;41 — ESq(Z))11Z,4,>qq(Z)}). This suggests that, in practice, when the 
risk measures and volatility are estimated, we could form violation residuals of the 
form = 

Resi (= S/O, Spa = (Lig — ES) ha, (4.52) 


where ee is the violation indicator defined in (4.51). We expect these to behave 
like realizations of iid variables from a distribution with mean zero and an atom of 
probability mass of size a at zero. To test for mean-zero behaviour we could perform 
a bootstrap test on the non-zero violation residuals that makes no assumption about 
their distribution. See Efron and Tibshirani (1994, p. 224) for a description of such 
a test. 


Backtesting the predictive distribution. As well as backtesting VaR and expected 
shortfall we can also devise tests that assess the overall quality of the estimated 
conditional loss distributions from which the risk-measure estimates are derived. Of 
course, our primary interest focuses on the measures of tail risk, but it is still useful 
to backtest our estimates of the whole predictive distribution to obtain additional 
confirmation of the risk-measure estimation procedure. 
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Suppose we define the process (U;);<z by setting U;41 := FL,,,|¢,(L:41). Under 
Assumption 4.28 it follows easily from (4.48) that U;+1 = Gz(Z;41), so (U;) is 
a strict white noise process. Moreover, if Gz is continuous, then the stationary 
or unconditional distribution of (U;) must be standard uniform (see, for example, 
Proposition 5.2). 

In actual applications we estimate F_,,,|¢, from data up to time t and we back- 
test our estimates by forming Û,ı = Fris, (Liri) on day t + 1. Suppose we 
estimate the predictive distribution on days t = 0,..., m — 1 and form backtesting 
data Û Tanis Um: we expect these to behave like a sample of iid uniform data. The 
distributional assumption can be assessed by standard goodness-of-fit tests like the 
chi-squared test or Kolmogorov—Smirnov test (see Section 3.3.5 for references). It 
is also possible to form the data p7! (01), Sahel (Om), where @ is the standard 
normal df; these should behave like iid standard normal data (see again Proposi- 
tion 5.2) and this can be tested as in Section 3.1.4. The strict white noise assumption 
can be tested using the approach described in Section 4.2.3. 


Notes and Comments 


The backtesting material is mainly taken from McNeil and Frey (2000), where exam- 
ples of the binomial test for violation counts and the test of expected shortfall using 
exceedance residuals can be found. Use of the runs test for testing the randomness 
of VaR violations is suggested by Christoffersen, Diebold and Schuermann (1998). 
This test is shown to be uniformly most powerful against Markovian alternatives by 
Lehmann (1986). Christoffersen, Diebold and Schuermann also suggest the use of 
a further test for randomness based on the non-trivial eigenvalue of the transition 
matrix in a Markov chain model for the violation indicator variables. 

The idea of testing the estimate of the predictive distribution may be found in 
Berkowitz (2001, 2002). See also Berkowitz and O’Brien (2002) for a more general 
article on testing the accuracy of the VaR models of commercial banks. 


4.5 Fundamentals of Multivariate Time Series 


The presentation of the basic concepts of multivariate time series in this section 
closely parallels the presentation of the corresponding ideas for univariate time 
series in Section 4.2. Again the approach is similar to that of Brockwell and Davis 
(1991, 2002). 


4.5.1 Basic Definitions 


A multivariate time series model for multiple risk factors is a stochastic process 
(X;);ez, 1.e. a family of random vectors, indexed by the integers and defined on 
some probability space (2, F, P). 


Moments of a multivariate time series. Assuming they exist, we define the mean 
function w(t) and the covariance matrix function T (t, s) of (X;)rez by 
M(t) = E(X;), teZ, 
r, s) = E(X; — MOX, — w)y’), t,s €Z. 
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Analogously to the univariate case, we have I (t, t) = cov(X;). By observing that 
the elements 7;;(t, s) of I(t, s) satisfy 


Vij(t, s) = covV(X, i, Xs, j) = cov (Xs, j, X11) = Vi, t), (4.53) 


it is clear that IT (t, s) = T (s, t)' for all t, s. However, the matrix I need not be 
symmetric, so in general T(t, s) Æ IT (s, t), which is in contrast to the univariate 
case. Lagged values of one of the component series can be more strongly correlated 
with future values of another component series than vice versa. This property, when 
observed in empirical data, is known as a lead-lag effect and is discussed in more 
detail in Example 4.36. 


Stationarity. Again the concrete multivariate models we consider will be stationary 
in one or both of the following senses. 


Definition 4.30 (strict stationarity). The multivariate time series (X;),<z is strictly 
stationary if 


d 
(Xie sa A) = (Xi +h TO ee es 
for all t),...,t),,k € Zand for all n EN. 


Definition 4.31 (covariance stationarity). The multivariate time series (X;);<z is 
covariance stationary (or weakly or second-order stationary) if the first two moments 
exist and satisfy 


H(t) = BM, teZ, 
(t,s)=Ft+k,s+k), t,s,k €Z. 


A strictly stationary multivariate time series with finite covariance matrix is covari- 
ance stationary, but we again note that it is possible to define infinite-variance pro- 
cesses (including certain multivariate ARCH and GARCH processes) that are strictly 
stationary but not covariance stationary. 


Serial and cross-correlation in stationary multivariate time series. The definition 
of covariance stationarity implies that for all s, t we have T (t — s, 0) = T (t, s), so 
that the covariance between X; and Xs only depends on their temporal separation 
t — s, which is known as the lag. In contrast to the univariate case, the sign of 
the lag is important. For a covariance-stationary multivariate process we write the 
covariance matrix function as a function of one variable: r (h) := r (h, 0), Wh € Z. 
Noting that T (0) = cov(X;), Vt, we can now define the correlation matrix function 
of a covariance-stationary process. 


Definition 4.32 (correlation matrix function). Writing A := A(T (0)), where 
A(-) is the operator defined in (3.4), the correlation matrix function P(h) of a 
covariance-stationary multivariate time series (X;)rez iS 


Pith) :=A'F(h)A!, VheZ. (4.54) 
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The diagonal entries pj; (h) of this matrix-valued function give the autocorrelation 
function of the ith component series (X;,;);-z. The off-diagonal entries give so- 
called cross-correlations between different component series at different times. It 
follows from (4.53) that P(h) = P(—h)’, but P(h) need not be symmetric, and in 
general P(h) Æ P(—h). 


White noise processes. As inthe univariate case, multivariate white noise processes 
are building blocks for more interesting classes of time series model. 


Definition 4.33 (multivariate white noise). (X;),<z, is multivariate white noise if 
it is covariance stationary with correlation matrix function given by 


P, h=0, 


P(h) = 
oY 0, h#0, 


for some positive-definite correlation matrix P. 


A multivariate white noise process with mean zero and covariance matrix 
X = cov(X;) will be denoted WN(0, X). Such a process has no cross-correlation 
between component series, except for contemporaneous cross-correlation at lag 
zero. A simple example is a series of iid random vectors with finite covariance 
matrix, and this is known as a multivariate strict white noise. 


Definition 4.34 (multivariate strict white noise). (X;);<z is multivariate strict 
white noise if it is a series of iid random vectors with finite covariance matrix. 


A strict white noise process with mean zero and covariance matrix X will be 
denoted SWN(0, X). 

The martingale-difference noise concept may also be extended to higher dimen- 
sions. As before we assume that the time series (X;);<7z is adapted to some filtration 
(F;), typically the natural filtration (o ({X5 : s < t})), which represents the infor- 
mation available at time t. 


Definition 4.35 (multivariate martingale difference). (X;);<z has the multi- 
variate martingale-difference property with respect to the filtration (¥;) if E|X;| < 
oo and 

E(X, | Fi-1) = 9, Vt eZ. 


The unconditional mean of such a process is obviously also zero and, if cov(X;) < 
oo for all ¢, the covariance matrix function satisfies T (t, s) = 0 fort Æ s. If the 
covariance matrix is also constant for all t, then a process with the multivariate 
martingale-difference property is also a multivariate white noise process. 


4.5.2 Analysis in the Time Domain 


We now assume that we have a random sample X,,..., X, from a covariance- 
stationary multivariate time series model (X;);<z. In the time domain we construct 
empirical estimators of the covariance matrix function and the correlation matrix 
function from this random sample. 
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The sample covariance matrix function is calculated according to 


1 n—h 
Ph) = — Xan- XX- Xy, 0<h<n, 
n t=1 


where X = >, X,/n is the sample mean, which estimates mw, the mean of the 
time series. Writing A := A(T (0)), where A(-) is the operator defined in (3.4), the 
sample correlation matrix function is 


Pry =A 'PA)A!, O<h<n. 


The information contained in the elements ĝ;; (h) of the sample correlation matrix 
function is generally displayed in the cross-correlogram, which is a d x d matrix 
of plots (see Figure 4.13 for an example). The ith diagonal plot in this graphic 
display is the correlogram of the ith component series, given by {(h, 0;;(h)) : h = 
0, 1,2, ...}. For the off-diagonal plots containing the estimates of cross-correlation 
there are various possible presentations and we will consider the following con- 
vention: fori < j we plot {(h, pij (h)) : h = 0,1,2,...}; fori > j we plot 
{(—hA, bij (h)) : h = 0,1,2,...}. An interpretation of the meaning of the off- 
diagonal pictures is given in Example 4.36. 

It can be shown that for causal processes driven by multivariate strict white noise 
innovations (see Section 4.5.3) the estimates that comprise the components of the 
sample correlation matrix function Ê (h) are consistent estimates of the underlying 
theoretical quantities. For example, if the data themselves are from an SWN, then 
the cross-correlation estimators 6;;(h) for h # 0 converge to zero as the sample 
size is increased. However, results concerning the asymptotic distribution of cross- 
correlation estimates are, in general, more complicated than the univariate result for 
autocorrelation estimates given in Theorem 4.13. Some relevant theory is found in 
Chapter 11 of Brockwell and Davis (1991) and Chapter 7 of Brockwell and Davis 
(2002). It is standard to plot the off-diagonal pictures with Gaussian confidence 
bands at (—1.96./n, 1.96,/n), but these bands should be used as rough guidance 
for the eye and not relied upon too heavily to draw conclusions. 


Example 4.36 (cross-correlogram of trivariate index returns). In Figure 4.13 
the cross-correlogram of daily log-returns is shown for the Dow Jones, Nikkei and 
Swiss Market indices for 26 July 1996 to 25 July 2001. Although every vector 
observation in this trivariate time series relates to the same trading day, the returns 
are of course not properly synchronized due to time zones; nonetheless, this picture 
shows interpretable lead-lag effects which help us to understand the off-diagonal 
pictures in the cross-correlogram. 

Part (b) of the figure shows estimated correlations between the Dow Jones index 
return on day t + h and the Nikkei index return on day t, for h > 0; clearly these esti- 
mates are small and lie mainly within the confidence band, with the obvious excep- 
tion of the correlation estimate for returns on the same trading day P12(0) x 0.14. 
Part (d) shows estimated correlations between the Dow Jones index return on day 
t + h and the Nikkei index return on day t, for h < 0; the estimate corresponding to 
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Figure 4.13. Cross-correlogram of daily log-returns of Dow Jones, Nikkei and Swiss 
Market indices for 26 July 1996 to 25 July 2001 (see Example 4.36 for commentary). 


h = —1 is approximately 0.28 and can be interpreted as showing how the American 
market leads the Japanese market. Comparing parts (c) and (g) we see, unsurpris- 
ingly, that the American market also leads the Swiss market, so that returns on day 
t — 1 in the former are quite strongly correlated with returns on day t in the latter. 


4.5.3 Multivariate ARMA Processes 


We provide a brief excursion into multivariate ARMA models to indicate how 
the ideas of Section 4.2.2 generalize to higher dimensions. For daily data, captur- 
ing multivariate ARMA effects is much less important than capturing multivariate 
volatility effects (and dynamic correlation effects) through multivariate GARCH 
modelling, but, for longer-period returns, the more traditional ARMA processes 
become increasingly useful. In the econometrics literature they are more commonly 
known as vector ARMA (or VARMA) processes. 


Definition 4.37 (VARMA process). Let (€;);cz be WN(0, X). The process 
(X;)rez is a zero-mean VARMA (p, q) process if it is a covariance-stationary pro- 
cess satisfying difference equations of the form 


Xt — DXi- — ++» — Op Xp = € + Oler- +++ + Oget-q, WteZ 


for parameter matrices ; and ©; in Rixa, (X:+) is a VARMA process with mean 
H if the centred series (X; — M)rez is a zero-mean VARMA (p, q) process. 
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For practical applications we again consider only causal VARMA processes, 
which are processes where the solution of the defining equations has a representation 
of the form 


[e0] 
X=} Vieni, (4.55) 
i=0 


where (YW) cNy is a sequence of matrices in R¢*4 with absolutely summable com- 
ponents, meaning that, for any j and k, 


[0.6] 
ye lVi, jk] < œ. (4.56) 
i=0 


As in the univariate case (see Proposition 4.9), it can be verified by direct calculation 
that such linear processes are covariance stationary. Obviously, for all t, we have 
E(X;) = qm. For h > 0 the covariance matrix function is given by 


(oe) CO 
T(t +h, t) = cov(X, +h, X) = 403 Wert n—i Dew): 
i=0 j=0 


Arguing much as in the univariate case it is easily shown that this depends only on 
h and not on ¢ and that it is given by 


0O 
rh) =Y Win Ee. h=0,1,2,.... (4.57) 
=0 


The correlation matrix function is easily derived from (4.57) and (4.54). 

The requirement that a VARMA process be causal imposes conditions on the 
values that the parameter matrices ®; (in particular) and ©; may take. The theory 
is remarkably similar to univariate ARMA theory. We will give a single useful 
example from the VARMA class; this is the first-order vector autoregressive (or 
VAR(1)) model. 


Example 4.38 (VAR(1) process). The first-order VAR process satisfies the set of 
vector difference equations 


X, = DX + er, Vt. (4.58) 


It is possible to find a causal process satisfying (4.55) and (4.56) that is a solution 
of (4.58) if and only if all eigenvalues of the matrix ® are less than one in absolute 
value. The causal process 


0O 
X, = a Diei (4.59) 
i=0 


is then the unique solution. This solution can be thought of as an infinite-order vec- 
tor moving-average process, a so-called VMA(oo) process. The covariance matrix 
function of this process follows from (4.55) and (4.57) and is 
[0.0] 
rh) = Y D So” = oro), h=0,1,2,.... 
i=0 
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In practice, full VARMA models are less common than models from the VAR 
and VMA subfamilies, one reason being that identifiability problems arise when 
estimating parameters. For example, we can have situations where the first-order 
VARMA(1, 1) model X,-—®X,_) = €;+@e;_ canbe rewritten as X,—®* X;_1 = 
€,+0*e,_, for completely different parameter matrices #* and ©* (see Tsay (2002, 
p. 323) for an example). Of the two subfamilies, VAR models are easier to estimate. 
Fitting options for VAR models range from multivariate least-squares estimation 
without strong assumptions concerning the distribution of the driving white noise, 
to full maximum likelihood estimation; models combining VAR and multivariate 
GARCH features can be estimated using a conditional ML approach in a very similar 
manner to that described for univariate models in Section 4.3.4. 


Notes and Comments 


Many standard texts on time series also handle the multivariate theory (see, for 
example, Brockwell and Davis (1991, 2002) or Hamilton (1994)). A key reference 
aimed at an econometrics audience is Lutkepohl (1993). For examples, in the area 
of finance see Tsay (2002) and Zivot and Wang (2003). 


4.6 Multivariate GARCH Processes 
4.6.1 General Structure of Models 


Definition 4.39. Let (Z;);cz be SWN(0, I4). The process (X;);¢z is said to be a 
multivariate GARCH process if it is strictly stationary and satisfies equations of the 
form 

X,= 5/72, teZ, (4.60) 


where 5 2 e R44 is the Cholesky factor of a positive-definite matrix X, which is 
measurable with respect to F;—1 = o ({Xs : s < t — 1}), the history of the process 
up to time ¢t — 1. 


Conditional moments. It is easily calculated that a covariance-stationary process 
of this type has the multivariate martingale-difference property 


E(X: | Fi-1) = E(X} Z | Fi) = EP E(Z,) = 0, 


and must therefore be a multivariate white noise process, as argued in Section 4.5. 
Moreover, X, will be the conditional covariance matrix since 


1 


cov(X; | Fi—1) = E(X: X! | Fi) = XI  E(Z, ZI (3; re 


2 1/2 
PY = DI E Y = 2 


(4.61) 


The conditional covariance matrix X; in a multivariate GARCH model corresponds 
to the squared volatility o? in a univariate GARCH model. The use of the Cholesky 
factor of X; to describe the relationship to the driving noise in (4.60) is not important, 
and in fact any type of “square root” of X, could be used (such as the root derived 
from a symmetric decomposition). (The only implication is the way we construct 
residuals when fitting the model in practice.) We denote the elements of X, by o;,;; 
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and also use the notation o;,; = ./o;,;; to denote the conditional standard deviation 
(or volatility) of the ith component series (X;,;) eZ. 
We recall that we can write X, = A; P, A;, where 


A; = A(X) = diag(o;,1, sees Ord), P, = o (£), (4.62) 


using the operator notation defined in (3.5). The diagonal matrix A; will be known 
as the volatility matrix and P, is known as the conditional correlation matrix. The 
art of building multivariate GARCH models is to specify the dependence of X, (or 
of A; and P;) on the past in such a way that X, always remains symmetric and 
positive definite. A covariance matrix must of course be symmetric and positive 
semidefinite, and in practice we restrict our attention to the positive-definite case 
(which facilitates fitting, since the conditional distribution of X, | F;—1 never has a 
singular covariance matrix). 


Unconditional moments. The unconditional covariance matrix X of a process of 
this type is given by 
X = cov(X;) = E(cov(X; | F;-1)) + cov(E(X; | Fi-1)) = E(21), 


from which it can be calculated that the unconditional correlation matrix P has 
elements 
E(o;,ij) T E (Pt ijOt iO, j) 


JEDE) JE@ EC?) 


Pij (4.63) 


which is in general difficult to evaluate and is usually not simply the expectation of 
the conditional correlation matrix. 


Innovations. In practical work the innovations are generally taken to be from 
either a multivariate Gaussian distribution (Z; ~ Na(0, I4)) or, more realisti- 
cally for daily returns, an appropriately scaled spherical multivariate ¢ distribution 
(Z: ~ ta(v, 0, (v — 2)Ig/v)). Any distribution with mean zero and covariance 
matrix Iq is permissible, and appropriate members of the normal mixture family 
of Section 3.2 or the spherical family of Section 3.3.1 may be considered. 


Presentation of models. In the following sections we present some of the more 
important multivariate GARCH specifications. In doing this we concentrate on the 
following aspects of the models. 


e The form of the dynamic equations, with economic arguments and criticisms 
where appropriate. 


e The conditions required to guarantee that the conditional covariance matrix 
X, remains positive definite. Other mathematical properties of these mod- 
els, such as conditions for covariance stationarity, are difficult to derive with 
full mathematical rigour; references in Notes and Comments contain further 
information. 
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e The parsimoniousness of the parametrization. A major problem with most 
multivariate GARCH specifications is that the number of parameters tends to 
explode with the dimension of the model, making them unsuitable for analyses 
of many risk factors. 


Simple intuitive fitting methods where available. All models can be fitted by a 
general global-maximization approach described in Section 4.6.4 but certain 
models lend themselves to estimation in stages, particularly the models of 
Section 4.6.2. 


4.6.2 Models for Conditional Correlation 


In this section we present models which focus on specifying the conditional corre- 
lation matrix P; while allowing volatilities to be described by univariate GARCH 
models; we begin with a popular and relatively parsimonious model where P, is 
assumed to be constant for all t. 


Constant conditional correlation (CCC). 


Definition 4.40. The process (X;);¢z is a CCC-GARCH process if it is a pro- 
cess with the general structure given in Definition 4.39 such that the conditional 
covariance matrix is of the form X, = A; P. As, where 


(i) Pe is a constant, positive-definite correlation matrix; and 


(ii) A; is a diagonal volatility matrix with elements o; ; satisfying 


Pk qk 
oF, = aro + > on XP ip + >) Bejiop jg, K=1,...,d, (4.64) 
i=l j=l 


where ayo > 0, a4; 2 0,i = 1,..., Pk, Bey 29, j= 1,..., qk. 


The CCC-GARCH specification represents a simple way of combining univariate 
GARCH processes. This can be seen by observing that in a CCC-GARCH model 
observations and innovations are connected by equations X; = A; P? / 275. which 
may be rewritten as X; = AZ for an SWN(O, P.) process (Zi)teZ- Clearly, the 
component processes are univariate GARCH. 


Proposition 4.41. The CCC-GARCH model is well defined in the sense that X; is 
almost surely positive definite for all t. Moreover, it is covariance stationary if and 
only if} it oi + O48, Bay < 1 fork =1,...,d. 


Proof. For a vector v Æ 0 in R? we have 
v' Xv = (A;v)'P.(A;v) > 0, 


since P, is positive definite and the strict positivity of the individual volatility pro- 
cesses ensures that A;v Æ 0 for all t. 

If (X+);ez is covariance stationary, then each component series (X; k)rez iS a 
covariance-stationary GARCH process for which a necessary and sufficient condi- 
tion is D? oki + Da | kj < 1 by Proposition 4.21. Conversely, if the component 
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series are covariance stationary, then for all i and j the Cauchy—Schwarz inequality 


implies 
oij = E(Or,ij) = pij E (0t, iot, j) S pij Elof) y Eo; < œ. 


Since (X;)rez is a multivariate martingale difference with finite, non-time-dependent 


second moments 0;;, it is a covariance-stationary white noise. 


The CCC model is often a useful starting point from which to proceed to more 
complex models. In some empirical settings it gives an adequate performance, but 
it is generally considered that the constancy of conditional correlation in this model 
is an unrealistic feature and that the impact of news on financial markets requires 
models that allow a dynamic evolution of conditional correlation as well as a dynamic 
evolution of volatilities. A further criticism of the model (which applies in fact to 
the majority of MGARCH specifications) is the fact that the individual volatility 
dynamics (4.64) do not allow for the possibility that large returns in one component 
series at a particular point in time can contribute to increased volatility of another 
component time series at future points in time. 

To describe a simple method of fitting the CCC model we introduce the notion of 
a devolatized process. For any multivariate time series process X;, the devolatized 
process is the process Y; = A; ‘x +, where A, is, as usual, the diagonal matrix of 
volatilities. In the case of a CCC model it is easily seen that the devolatized process 
(Y; )rez is an SWN(0, Pc) process. 

This structure suggests a simple two-stage fitting method in which we first esti- 
mate the individual volatility processes for the component series by fitting univariate 
GARCH processes; note that, although we have specified in Definition 4.40 that the 
individual volatilities should follow standard GARCH models, we could of course 
extend the model to allow any of the univariate models in Section 4.3.3, such as 
GARCH with leverage or threshold GARCH. In a second stage we construct an 
estimate of the devolatized process by taking f, = Ay 1 X,, where Âp! is the esti- 
mate of A;; in other words, we collect the standardized residuals from the univariate 
GARCH models. If the CCC-GARCH model is adequate, then the f ı data should 
behave like a realization from an SWN(0, P¿) process and this can be investigated 
with the correlogram and cross-correlogram applied to raw and absolute values. 
Assuming the adequacy of the model, the conditional correlation matrix P, can then 
be estimated from the standardized residuals using methods from Chapter 3. 

A special case of CCC-GARCH which we call a pure diagonal model occurs when 
P. = Iq. A covariance-stationary model of this kind is a multivariate white noise 
where the contemporaneous components X;,; and X, j are also uncorrelated for 
i Æ j. Whether they are independent or not depends on further assumptions about 
the driving SWN(0, I4) process: if the innovations have independent components, as 
would be the case if they were multivariate Gaussian, then the component series are 
independent; however, if, for example, Z; ~ ta (v, 0, (v —2)/v)Z,), the component 
processes are dependent. 
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Dynamic conditional correlation (DCC). This model generalizes the CCC model 
to allow conditional correlations to evolve dynamically according to a relatively 
parsimonious scheme, but is constructed in a way that still allows estimation in 
stages using univariate GARCH models. Its formal analysis as a stochastic process 
is difficult due to the use of the correlation matrix extraction operator œ in its 
definition. 


Definition 4.42. The process (X;)rez is a DCC-GARCH process if it is a process 
with the general structure given in Definition 4.39, where the volatilities compris- 
ing A; follow univariate GARCH specifications as in (4.64) and the conditional 
correlation matrices P, satisfy, for t € Z, the equations 


P q p q 
P, = o( (1 -J ai- DB) Pe +) ai¥,i¥/_j + HP). (4.65) 
i=l j=l i=l j=l 


where P, is a positive-definite correlation matrix, go is the operator in (3.5), Y, = 
A7'X , denotes the devolatized process, and the coefficients satisfy a; > 0, Bj; > 0 


and X? aj — Di Bj <1. 


Observe first that if all the œ; and 6; coefficients in (4.65) are zero, then the model 
reduces to the CCC model. If one makes an analogy with a covariance-stationary 
univariate GARCH model with unconditional variance o”, for which the volatility 
equation can be written 


p q P 4 
o= (I~ Ya -Xb+ ait boy 
i=l j=l i=l j=l 


then the correlation matrix P, in (4.65) can be thought of as representing the long- 
run correlation structure. Although this matrix could be estimated by fitting the 
DCC model to data by ML estimation in one step, it is quite common to estimate it 
using an empirical correlation matrix calculated from the devolatized data, as in the 
estimation of the CCC model. 

Observe also that the dynamic equation (4.65) preserves the positive definiteness 
of P;. If we define 


P q P q 
QO; := (1 = So ai = Yi) Pe ae X oY iY; ; $ XO jPi, 
i=l j=l i=1 yal 


and assume that P;_g,..., P;—1 are positive definite, then it follows that, for a vector 
v40in RZ, we have 


p q p q 
v Qv = (1 = Xa; — Dp Pwt ee wY BP > 0, 
i=l j=1 i=l j=! 


since the first term is strictly positive and the second and third terms are non-negative. 
If Q; is positive definite, then so is P;. 
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The usual estimation method for the DCC model is as follows. 


(1) Fit univariate GARCH-type models to the component series to estimate the 
volatility matrix A;. Form an estimated realization of the devolatized process 
by taking Y, = A,X}. 


2) Estimate P, by taking the sample correlation matrix of the devolatized data 
y g p 
(or better still some robust estimator of correlation). 


(3) Estimate the remaining parameters œ; and 6; in equation (4.65) by fitting 
a model with structure Y; = pr ZA to the devolatized data. We leave this 
step vague for the time being and note that this will be a simple applica- 
tion of the methodology for fitting general multivariate GARCH models in 
Section 4.6.4; in a first-order model (p = q = 1), there will only be two 
remaining parameters to estimate. 


4.6.3 Models for Conditional Covariance 


The models of this section specify explicitly a dynamic structure for the conditional 
covariance matrix X,. These models are not designed for multiple-stage estimation 
based on univariate GARCH estimation procedures. 


Vector GARCH Models (VEC and DVEC). The most general vector GARCH 
model—the VEC model—has too many parameters for practical purposes and our 
task will be to simplify the model by imposing various restrictions on parameter 
matrices. 


Definition 4.43 (VEC model). The process (X;);¢z is a VEC process if it has 
the general structure given in Definition 4.39, and the dynamics of the conditional 
covariance matrix X; are given by the equations 


P q 
vech(Z;) = ao + J Aj vech(X,—iX;_;) + C Bj vech(X;—j), (4.66) 
i=l j=l 


for a vector ay € R4¢*))/? and matrices A; and B; in R@E+)/2)x@@41)/2)_ 


In this definition “vech” denotes the vector half operator, which stacks the 
columns of the lower triangle of a symmetric matrix in a single column vector 
of length d(d + 1)/2. Thus (4.66) should be understood as specifying the dynam- 
ics for the lower-triangular portion of the conditional covariance matrix, and the 
remaining elements of the matrix are determined by symmetry. 

In this very general form the model has (1 + (p + g)d(d + 1)/2)d(d + 1)/2 
parameters; this number grows rapidly with dimension so that even a trivariate model 
has 78 parameters. The most common simplification has been to restrict attention to 
cases when A; and B j are diagonal matrices, which gives us the diagonal VEC or 
DVEC model. This special case can be written very elegantly in terms of a different 
kind of matrix product, namely the Hadamard product, denoted “o”, which signifies 
element-by-element multiplication of two matrices of the same size. We obtain the 


176 4. Financial Time Series 


representation 


p q 
X, = Ao + > Aio (X;-iX)_,) + >> Bj o Xij, (4.67) 
i=1 j=1 
where Ao and the A; and B; must all be symmetric matrices in R¢*4 such that Ao has 
positive diagonal elements and all other matrices have non-negative diagonal ele- 
ments (standard univariate GARCH assumptions). This representation emphasizes 
structural similarities with the univariate GARCH model of Definition 4.20. 
To understand better the dynamic implications of (4.67), consider a bivariate 
model of order (1, 1) and write ao,;;, a1,ij and b;; for the elements of Ag, A; and 
B, respectively. The model amounts to the three simple equations 


2 2 2 

ofi = 4011 + 111 X7_1 1 + O10; 11, 

07,12 = 40,12 + 41,12X1—-1,1 Xt-1,2 + b1201-1,12, (4.68) 
2 2 2 

ofa = 40,22 + 41,22X7_ 1 9 + 5220/1 9- 


The volatilities of the two component series (0;,; and o;,2) follow univariate GARCH 
updating patterns, and the conditional covariance o;,12 has a similar structure driven 
by the products of the lagged values X;_1,1 X;~1,2. As for the CCC and DCC models, 
the volatility of a single component series is only driven by large lagged values of 
that series and cannot be directly affected by large lagged values in another series; 
the more general but overparametrized VEC model would allow this feature. 

The requirement that X; in (4.67) should be a proper positive-definite covariance 
matrix does impose conditions on the Ao, A; and B; matrices that we have not 
yet discussed. In practice, in some software implementations of this model, formal 
conditions are not imposed, other than that the matrices should be symmetric with 
non-negative diagonal elements; the positive definiteness of the resulting estimates 
of the conditional covariance matrices can be checked after model fitting. 

However, a sufficient condition for X, to be almost surely positive definite is 
that Ao should be positive definite and the matrices Aj, ..., Ap, B1,..., Bg should 
all be positive semidefinite (see Notes and Comments) and this condition is easy to 
impose. We can constrain all parameter matrices to have a form based on a Cholesky 
decomposition; that is we can parametrize the model in terms of lower-triangular 


Cholesky factor matrices Ag z „A /2 and By i satisfying 
1/2 1/2 17251727 
Ao = Ag (APY, Ai = A, BaS; Bj =B,’ (B;°V. (4.69) 
Because the sufficient condition only prescribes that Aj,..., Ap and B1, ..., Bq 


should be positive semidefinite, we can in fact also consider much simpler 
parametrizations, such as 


Ao =A AY, Ai = aia}, Bj = bb}, ey) 


where a; and b; are vectors in R¢. An even cruder model, satisfying the requirement 
of positive definiteness of 2’, would be 


Ao = Ag (Ag), Ai =aila, By = Djla, (4.71) 
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where a; and bj are simply positive constants. In fact the specifications of the 
multivariate ARCH and GARCH effects in (4.69)-(4.71) can be mixed and matched 
in obvious ways. 


The BEKK model of Baba, Engle, Kroner and Kraft. The next family of models 
have the great advantage that their construction ensures the positive definiteness of 
X, without the need for further conditions. 


Definition 4.44. The process (X+) ez is a BEKK process if it has the general struc- 
ture given in Definition 4.39, and if the conditional covariance matrix X, satisfies, 
for all t € Z, 


P q 
X, = Ao + J AX iX}_jAi + J BY DyjB;, (4.72) 
i=l j=l 


where all coefficient matrices are in R¢*@ and Ao is symmetric and positive definite. 


Proposition 4.45. In the BEKK model (4.72), the conditional covariance matrix 
X; is almost surely positive definite for all t. 


Proof, Consider a first-order model for simplicity. For a vector v 4 0 in R? we 
have 
v D,v = v' Agu + (vA, X1-1)* + (Biv) 5—1 (Biv) > 0, 


since the first term is strictly positive and the second and third terms are non- 
negative. 


To gain an understanding of the BEKK model it is again useful to consider the 
bivariate special case of order (1, 1) and to consider the dynamics that are implied 
while comparing these with equations (4.68): 


ofi = a0,11 + a aX ia + 2a1 ,1141,12X1—-1,1Xt-1,2 + aii 3 
+ biok i1 + 2bibizor1,12 + bi2071,2; (4.73) 
01,12 = 40,12 + (41,1141,22 + 41,1241,21)X1-1,1X1-1,2 
+ ai 1101,21 X21 + ay,2241,12X7_1 
+ (biib22 + b12b21)or-1,12 + biban + bzbnof iz; (4.74) 
ofa = 40,22 + Gi 2X712 + 241,2201,21X1-1,1X1-1,2 + AF X71 
+ b3,07_ 1.9 + 2br2b2101-1,21 + 319711. (4.75) 
From (4.73) it follows that we now have a model where a large lagged value of 
the second component X;—1,2 can influence the volatility of the first series o;,1. 
The BEKK model has more parameters than the DVEC model and appears to have 
much richer dynamics. Note, however, that the DVEC model cannot be obtained as 
a special case of the BEKK model as we have defined it. To eliminate all crossover 


effects in the conditional variance equations of the BEKK model in (4.73) and (4.75) 
we would have to set the diagonal terms a1,12, 41,21, b12 and b2; to be zero and the 
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Table 4.3. Summary of numbers of parameters in various multivariate GARCH models: in 
CCC it is assumed that the numbers of ARCH and GARCH terms for all volatility equations 
are, respectively, p and q; in DCC it is assumed that the conditional correlation equation has 
p +q parameters. The second column gives the general formula; the final columns give the 
numbers for models of dimensions 2, 5 and 10 when p = g = 1. Additional parameters in 
the innovation distribution are not considered. 


Model Parameter count 2 5 10 
VEC d(d+1)d1+(p+q)d(d4+1)/2)/2 21 465 6105 
BEKK d(d +1)/2+d2(p+q) 11 65 255 
DVEC as in (4.69) d(d + 1)(1 + p +4)/2 9 45 165 
DCC d(d + 1)/2+ (d+ 1)\(p +q) 9 2T 71 
CCC d(d+1)/2+d(p+q) 7 25 75 
DVEC as in (4.70) d(d+1)/2+d(p+q) 7 25 75 
DVEC as in (4.71) d(d+1)/2+(p+q) 5 17 57 


parameters governing the individual volatilities would also govern the conditional 
covariance o;,12 in (4.74). 


Remark 4.46. A broader definition of the BEKK class, which does subsume all 
DVEC models, was originally given by Engle and Kroner (1995). In this definition 
we have 


K q 


K p 
X; = AoAo + 5 >» Ak X1-iX;_j Aki + X 2 By j 2—; Bj, 
k=1 i=1 k=1 j=1 


where 5d (d+1) > K > 1 andthe choice of K determines the richness of the model. 
This model class is of largely theoretical interest and tends to be too complex for 
practical applications; even the case K = 1 is difficult to fit in higher dimensions. 


In Table 4.3 we have summarized the numbers of parameters in these models. 
Broad conclusions concerning the practical implications are as follows: the general 
VEC model is of purely theoretical interest; the BEKK and general DVEC models 
are for very low-dimensional use; the remaining models are the most practically 
useful. 


4.6.4 Fitting Multivariate GARCH Models 


Model fitting. We have already given notes on fitting some models in stages and 
it should be stressed that in the high-dimensional applications of risk management 
this may in fact be the only feasible strategy. Where interest centres on a multivariate 
risk-factor return series of more modest dimension (perhaps less than 10), we can 
attempt to fit multivariate GARCH models by maximizing an appropriate likelihood 
with respect to all parameters in a single step. The procedure follows from the method 
for univariate time series described in Section 4.3.4. 

The method of building a likelihood for a generic multivariate GARCH model 
X;= 5 Z, is completely analogous to the univariate case; consider again a first- 
order model (p = g = 1) for simplicity and assume that our data are labelled 
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Xo, X1,..., Xn. A conditional likelihood is based on the conditional joint density 
of X,,..., Xn, given Xo and an initial value Xo for the conditional covariance 
matrix. This conditional joint density is 


IX) ,..,Xn|Xo, 50 (X15 ++ +> Xn | Xo, Xo) 


n 
= I] FX AX )—15...X0, 50 (Xt | Xr-1,---, X0, X0)- 
=i 


If we denote the multivariate innovation density of Z; by g (z), then we have 
= -1/2 
XIX. X0, Eo (Ht | Xr-1; «++, X0, Do) = |X 7 g; l), 


where X, is a matrix-valued function of x;_1, . . . , xo and Xo. Most common choices 
of g (z) are in the spherical family so that by (3.46) we have g (z) = h(z’z) for some 
function A of a scalar variable (known as a density generator), yielding a conditional 
likelihood of the form 


n 
LO; X1,...,Xn) = | [127 PRX; Er X), 
t=1 
where all parameters appearing in the volatility equation and the innovation distri- 
bution are collected in 0. It would of course be possible to add a constant mean term 
or a conditional mean term with, say, vector autoregressive structure to the model 
and to adapt the likelihood accordingly. 

Evaluation of the likelihood requires us to input a value for Xo. Maximization 
can again be performed in practice using a modified Newton—Raphson procedure, 
such as that of Berndt et al. (1974). References concerning properties of estimators 
are given in Notes and Comments, although the literature for multivariate GARCH 
is small. 


Model checking and comparison. Residuals are calculated according to Ż, = 
S" 2Y ; and should behave like a realization of an SWN(0, I4) process. The usual 
univariate procedures (correlograms, correlograms of absolute values and portman- 
teau tests such as Ljung—Box) can be applied to the component series of the residuals. 
Also, there should not be any evidence of cross-correlations at any lags for either 
the raw or the absolute residuals in the cross-correlogram. 

Model selection is usually performed by a standard comparison of Akaike AIC 
numbers, although it should be stressed that there is not yet much literature on 
theoretical aspects of the use of Akaike in a univariate GARCH context, let alone a 
multivariate one. 


4.6.5 Dimension Reduction in MGARCH 


It is still true that attempting to model all financial risk factors with general multi- 
variate GARCH models is not recommended. Rather, these models have to be com- 
bined with factor-model strategies to reduce the overall dimension of the time series 
modelling problem. This is a large subject with many possible approaches and model 
structures and we give brief notes on some general strategies. 
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As discussed in Section 3.4.1, a fundamental consideration is whether factors are 
identified a priori and treated as observable exogenous variables, or whether they 
are treated as latent and are manufactured from the observed data. 


Observed factors. Suppose we adopted the former approach and identified a small 
number of common factors F; to explain the variation in many risk factors X;; we 
might, for example, use stock index returns to explain the variation in individual 
equity returns. These common factors could be modelled with relatively detailed 
multivariate GARCH models. The dependence of the individual returns on the factor 
returns could then be modelled by calibrating a factor model of the type 


X,=a+BF,+6e, t=1,...,n. 


In Section 3.4.3 we showed how this may be done in a static way using regression 
techniques. We now assume that, conditional on the factors F;, the errors €; form a 
multivariate white noise process with GARCH volatility structure. 

In an ideal factor model these errors would have a diagonal covariance matrix, 
because they would be attributable to idiosyncratic effects alone. In GARCH terms 
they might follow a pure diagonal model, i.e. a CCC model where the constant 
conditional correlation matrix is the identity matrix. A pure diagonal model can be 
fitted in two ways, which correspond to the two ways of estimating a static regression 
model. 


(1) Fit univariate models to the component series X14, ..., Xn k, k = 1,...,d. 
For each k assume that 


Xk = Uk + Et,k, Utk =a +b, F,, b= Voss pts 


where the errors £+, follow some univariate GARCH specification. 


(2) Fit in one step the multivariate model 
X,= hiten, bMr=a+BF,, t=1,...,n, 


where the errors e; follow a pure diagonal CCC model and the SWN(0, I4) 
process driving the GARCH model is some non-Gaussian spherical distribu- 
tion, such as an appropriate scaled ¢ distribution. (If the SWN is Gaussian, 
approaches (1) and (2) give the same results.) 


In practice, it is never possible to find the “right” common factors such that the 
idiosyncratic errors have a diagonal covariance structure. The pure diagonal assump- 
tion can be examined by looking at the errors from the GARCH modelling, esti- 
mating their correlation matrix and assessing its closeness to the identity matrix. In 
the case where correlation structure remains, the formal concept of the factor model 
can be loosened by allowing errors with a CCC-GARCH structure, which could be 
calibrated by two-stage estimation. 
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Principal components GARCH. As an alternative approach we could attempt to 
extend the idea of principal components to the time series context. A way of doing 
this is suggested by the following formally defined model. 


Definition 4.47. The process (X;)rez follows a PC-GARCH (or orthogonal 
GARCH) model if there exists some orthogonal matrix I € R¢*@ satisfying 
Il’ =I'T = Iq such that (IT X;)rez follows a pure diagonal GARCH model. 


If (X;)+ez follows a PC-GARCH process for some matrix I”, then we can intro- 
duce the process (Y;);¢z, defined by Y, = I’'X;, which satisfies Y, = A;Z;, where 
(Z;)1ez, is SWN(O, I4) and A; is a (diagonal) volatility matrix with elements that 
are updated according to univariate GARCH schemes and past values of the com- 
ponents of Y,. Since X, = I’ A, Z,, the conditional and unconditional covariance 
matrices have the structure 


x =F Arr’, X =T TE(AÐDT", (4.76) 


and are obviously symmetric and positive definite. 

Comparing with (3.67) we see that the PC-GARCH model implies a spectral 
decomposition of the conditional and unconditional covariance matrices. The eigen- 
values of the conditional covariance matrix, which are the elements of the diago- 
nal matrix Ae. are given a GARCH updating structure. The eigenvectors form the 
columns of I” and are used to construct the time series (Y;);<z, the principal com- 
ponent transform of (X;);cz. It should be noted that despite the simple structure 
of (4.76), the conditional correlation matrix of X; is not constant in this model. 

This is again a model whose structure permits estimation in stages; in the first step 
we calculate the spectral decomposition of the sample covariance matrix of the data 
S as in Section 3.4.4; this gives us an estimator G of I”. We then rotate the original 
data to obtain sample principal components {G’X, : t = 1,...,n}. These should 
be consistent with a pure diagonal model if the PC-GARCH is appropriate for the 
original data; there should be no cross-correlation between the series at any lag. In 
a second stage we fit univariate GARCH models to each time series of principal 
components in turn; the residuals from these GARCH models should behave like 
SWN(O, I4). 

The main motivation for using principal components is to reduce dimensionality. 
We expect that a subset of the principal components can explain the majority of vari- 
ability in both the conditional and unconditional covariance matrices. We use the idea 
embodied in equation (3.70), that the first k loading vectors in the matrix I” specify 
the most important principal components, and we write these columns in the sub- 
matrix lı € R¢** and use them to define factors F, = (Fi.1,..., Fi)! := TX. 
These factors satisfy F, = ey where Ay contains the upper k x k submatrix of 
A; and Z, ~ SWN(0, J;,). In other words, the factors follow a pure diagonal model 
of dimension k < d. 

Following the idea in (3.70), the PC-GARCH model can then be thought of 
as a factor model of the form X; = I F; + €, where the error term is usually 
ignored in practice. The conditional covariance matrix is effectively approximated by 
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Ly D AT | In practical terms, calibrating the model simply means that we only 
need to fit GARCH models to the first k time series of sample principal components. 


4.6.6 MGARCH and Conditional Risk Measurement 


Suppose we calibrate an MGARCH model (possibly with VARMA conditional 
mean structure) having the general structure X; = Mr + 5 A to historical 
risk-factor return data X;_n+1,..., X;. We are interested in the loss distribution 
of Li41 = li (X:+1) conditional on F; = o({Xs : s < t}), as described in Sec- 
tions 2.1.1 and 2.1.2. (We may also be interested in longer-period losses as in Sec- 
tion 2.3.4.) 

A general method that could be applied is the Monte Carlo method of Sec- 
tion 2.3.3: we could simulate many times the next value X; 1 (and subsequent values 
if needed) of the stochastic process (X;);¢z using estimates of 4,41 and X41. 

Alternatively, a variance—covariance calculation as in Section 2.3.1 could be made. 
Considering a linearized loss operator with the general form in (x) = —(c + bx), 
the moments of the conditional loss distribution would be 


E(LÂ a | Fi) = cr — biti,  cov(LÂ1 | Fr) = bi Digby. 


Under an assumption of Gaussian innovations, L A 1 | F; would be univariate Gaus- 
sian as in (2.30). Under an assumption of (scaled) t innovations, it would be uni- 
variate t. Again we would need estimates of X,+ı and M;+ı from our time series 
model, as in Section 4.4.2, and VaR and ES estimates would then follow easily for 
these distributions from calculations in Examples 2.14, 2.18 and 2.19. 


Example 4.48. Consider again the simple stock portfolio in Example 2.4 and sup- 
pose our time series model is a first-order DVEC model with a constant mean term. 
The model takes the form 


X- p= 5; Z, E= Ao + Aro (X11 — W(X; — W’) + Bo Isı. 

(4.77) 
Suppose we assume that the innovations are multivariate Student t. The standard 
risk measures applied to the linearized loss distribution would take the form 


w. X, 1w; (v — 2 
var, = -Vun + Va f Hiert ) Na): 


v l-a v-l 


ee sahur eae sti Me) (v4 tated?) 
i t ’ 


where the notation is as in Example 2.19. Estimates of the risk measures are obtained 
by replacing mw, v and X41 by estimates. The latter can be calculated iteratively 
from (4.77) using estimates of Ap, A; and B and a starting value for Xo. 


Multivariate EWMA. In Section 4.4.1 we saw how the EWMA or exponential 
smoothing procedure could be used as a simple alternative to GARCH volatility 
prediction. We note that there is a multivariate extension that may be used to make 
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one-step forecasts of conditional covariance matrices and that this can be thought 
of as a simple alternative to using the updating scheme in (4.77). 
We recall the univariate EWMA updating equation (4.47) and note that the multi- 
variate analogue is 
ni =X, X! + (1—a) dy, (4.78) 


where œ is some small positive number (typically of the order œ ~ 0.04). This 
method of updating is consistent with the idea of estimating 4, by a weighted 
sum of past values of the matrices X;X,, where the weights decay exponentially: 


n-1 


Smi =a (0-a) XiX i. 
i=0 


Notes and Comments 


The CCC-GARCH model was suggested by Bollerslev (1990), who used it to model 
European exchange-rate data before and after the introduction of the European 
Monetary System (EMS) and came to the expected conclusion that conditional 
correlations after the introduction of the EMS were higher. The idea of the DCC 
model is explored by Engle (2002), Engle and Sheppard (2001) and Tse and Tsui 
(2002). Fitting in stages is promoted in the formulation of Engle and Sheppard 
(2001) and asymptotic statistical theory for this procedure is given. Hafner and 
Franses (2003) suggest that the dynamics of CCC are too simple for collections of 
many asset returns and give a generalization. 

The DVEC model was proposed by Bollerslev, Engle and Wooldridge (1988). The 
more general (but overparametrized) VEC model is discussed in Engle and Kroner 
(1995) alongside the BEKK model, named after these two authors as well as Baba 
and Kraft, who co-authored an earlier unpublished manuscript. The condition for the 
positive definiteness of X, in (4.67), which suggests the parametrizations (4.69)— 
(4.71), is described in Attanasio (1991). 

There is limited work on statistical properties of QMLEs in multivariate mod- 
els: Jeantheau (1998) shows consistency for a general formulation and Comte and 
Lieberman (2003) show asymptotic normality for the BEKK formulation. 

The principal components GARCH (PC-GARCH) model was first described by 
Ding (1994) in a PhD thesis; under the name of orthogonal GARCH it has been 
extensively investigated by Alexander (2001). The latter shows how PC-GARCH 
can be used as a dimension reduction tool for expressing the conditional covariances 
of a number of asset return series in terms of a much smaller number of principal 
component return series. 

Survey articles by Bollerslev, Engle and Nelson (1994) and Bauwens, Laurent 
and Rombouts (2005) are useful sources of additional information and references 
for all of these multivariate models. 


5 


Copulas and Dependence 


In this chapter we look more closely at the issue of modelling the dependence among 
components of a random vector of financial risk factors using the concept of a copula. 
All readers are encouraged to read Section 5.1 in order to grasp the basic idea of a 
copula and to see examples. Thereafter the choice of material in this chapter may 
be based on the applied interests of the reader. 

Section 5.2 goes further into the issue of what it means to measure dependence. 
The limitations of linear correlation as a dependence measure are highlighted, par- 
ticularly when we leave the multivariate normal and elliptical distributions of Chap- 
ter 3 behind. Alternative dependence measures derived from copulas, such as rank 
correlations and coefficients of tail dependence, are discussed. Rank correlations 
are mainly of interest to readers who want to go on to calibrate copulas to data, 
while tail dependence is an important concept for all readers, since it addresses the 
phenomenon of joint extreme values in several risk factors, which is one of the major 
concerns in financial risk management (see also Section 4.1.2). 

In Section 5.3 we look in more detail at the copulas of normal mixture distribu- 
tions; these are the copulas that are used implicitly when normal mixture distribu- 
tions are fitted to multivariate risk-factor change data, as in Chapter 3. In Section 5.4 
we consider Archimedean copulas, which are widely used as dependence models 
in low-dimensional applications and which have also found an important niche in 
portfolio credit risk modelling, as will be seen in Chapters 8 and 9. The chapter ends 
with a section on fitting copulas to data. 


5.1 Copulas 


In a sense, every joint distribution function for a random vector of risk factors 
implicitly contains both a description of the marginal behaviour of individual risk 
factors and a description of their dependence structure; the copula approach provides 
a way of isolating the description of the dependence structure. It is of course only one 
way of treating dependence in multivariate risk models and is perhaps most natural 
in a Static distributional context rather than a dynamic time series one. Nonetheless, 
we view copulas as an extremely useful concept and see several advantages in 
introducing and studying them. 

First, copulas help in the understanding of dependence at a deeper level. They 
allow us to see the potential pitfalls of approaches to dependence that focus only 
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on correlation and show us how to define a number of useful alternative depend- 
ence measures. Copulas express dependence on a quantile scale, which is useful for 
describing the dependence of extreme outcomes and is natural in a risk-management 
context, where VaR has led us to think of risk in terms of quantiles of loss distribu- 
tions. 

Moreover, copulas facilitate a bottom-up approach to multivariate model build- 
ing. This is particularly useful in risk management, where we very often have a 
much better idea about the marginal behaviour of individual risk factors than we do 
about their dependence structure. An example is furnished by credit risk, where the 
individual default risk of an obligor, while in itself difficult to estimate, is at least 
something we can get a better handle on than the dependence among default risks 
for several obligors. The copula approach allows us to combine our more developed 
marginal models with a variety of possible dependence models and to investigate 
the sensitivity of risk to the dependence specification. Since the copulas we present 
are easily simulated, they lend themselves in particular to Monte Carlo studies of 
risk. 


5.1.1 Basic Properties 


Definition 5.1 (copula). A d-dimensional copula is a distribution function on [0, 1]@ 
with standard uniform marginal distributions. 


We reserve the notation C(u) = C(u1,..., uq) for the multivariate dfs that are 
copulas. Hence C is a mapping of the form C : [0, 1]? — [0, 1], i.e. a mapping of 
the unit hypercube into the unit interval. The following three properties must hold. 


(1) C(u1,..., ud) is increasing in each component u;i. 
(2) Cd,...,1,uj,1,..., 1) = u; for alli € {1,..., d}, u; € [0, 1]. 
(3) For all (a1,..., aq), (b1,..., bq) € [0, 1)¢ with a; < b; we have 


2 2 
Se GDC uin,- , udia) 2 0, (5.1) 


ij=l ig=1 
where uj; = aj anduj2 = bj forall j € {1,..., d}. 


The first property is clearly required of any multivariate df and the second property 
is the requirement of uniform marginal distributions. The third property is less 
obvious, but the so-called rectangle inequality in (5.1) ensures that if the random 
vector (U;,..., Ug)’ has df C, then P(ay < Uj < bj,...,ag < Ua <S bg) is 
non-negative. These three properties characterize a copula; if a function C fulfills 
them, then it is a copula. Note also that, for 2 < k < d, the k-dimensional margins 
of a d-dimensional copula are themselves copulas. 


Some preliminaries. In working with copulas we must be familiar with the opera- 
tions of probability and quantile transformation, as well as the properties of gener- 
alized inverses, which are summarized in Section A.1.2. The following elementary 
proposition is found in many probability texts. 
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Proposition 5.2. Let G be a distribution function and let G~ denote its generalized 
inverse, i.e. the function G© (y) = inf{x : G(x) 2 y}. 


(1) Quantile transformation. IfU ~ U (0, 1) has a standard uniform distribution, 
then P(G*(U) < x) = G(x). 


(2) Probability transformation. If Y has dfG, where G is a continuous univariate 
df, then G(Y) ~ U (0, 1). 


Proof. Let y € R and u € (0, 1). For the first part use the fact that 
Gly) zu => Gru)<y 
(see Proposition A.3(iv) in Section A.1.2), from which it follows that 
P(G*(U) < y) = PU < G(y)) = GQ). 
For the second part we infer that 
P(G(Y) <u) = P(G* o G(Y) < G* (u)) = P(Y < G“ (u)) =GoG* (u) =u, 


where the first inequality follows from the fact that G © is strictly increasing (Propo- 
sition A.3(ii)), the second follows from Proposition A.4, and the final equality is 
Proposition A.3(viii). 


Proposition 5.2(1) is the key to stochastic simulation. If we can generate a uniform 
variate U and compute the inverse of a df G, then we can sample from that df. Both 
parts of the proposition taken together imply that we can transform risks with a 
particular continuous df to have any other continuous distribution. For example, if Y 
has a standard normal distribution, then @(Y) is uniform by Proposition 5.2(1), and, 
since the quantile function of a standard exponential df G is G*(y) = — ln(1 — y), 
the transformed variable Z := —In(1 — ®(Y)) has a unit exponential distribution 
by Proposition 5.2(2). 


Sklar’s Theorem. The importance of copulas in the study of multivariate distribu- 
tion functions is summarized by the following elegant theorem, which shows, firstly, 
that all multivariate dfs contain copulas and, secondly, that copulas may be used in 
conjunction with univariate dfs to construct multivariate dfs. 


Theorem 5.3 (Sklar 1959). Let F be a joint distribution function with margins 
F,..., Fg. Then there exists a copula C : [0, 1]? — [0,1] such that, for all 
x1,...,Xq in R = [—oo, ov], 


F(x1,...,X@) = C (Fi (x1), ..., Fa(xa)). (5.2) 


If the margins are continuous, then C is unique; otherwise C is uniquely determined 
on Ran F; x Ran F> x --- x Ran Fy, where Ran F; = F; (R) denotes the range of F;. 
Conversely, if C is a copula and F\,..., Fq are univariate distribution functions, 
then the function F defined in (5.2) is a joint distribution function with margins 
Fi,..., Fa. 
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Proof. We prove the existence and uniqueness of a copula in the case when 
F\,..., Fq are continuous and the converse statement in its general form. For a 
full proof see Schweizer and Sklar (1983) or Nelsen (1999, p. 18). 

For any x1,...,%q in R= [—co, co] we may use similar reasoning to that used 
in Lemma A.2(ii) to infer that if X has df F, then 


F(x1,...,X%a) = PUPi(X1) S Fix), ..., Fa(Xa) < Fa(xa)). 


Since F), ..., Fg are continuous, Proposition 5.2(2) and Definition 5.1 imply that 
the df of (Fi (X1), ..., Fa(Xa)) is a copula, which we denote by C, and thus we 
obtain the identity (5.2). 

If we evaluate (5.2) at the arguments x; = F< (ui), OS <li=l,...,d, 
and use Proposition A.3(viii), we obtain 


C(u1,..., Ud) = F(FĂ (u1), ..., F Ga), (5.3) 


which gives an explicit representation of C in terms of F and its margins, and thus 
shows uniqueness. 

For the converse statement assume that C is a copula and that F}, ..., Fg are 
univariate dfs. We construct a random vector with df (5.2) by taking U to be a 
random vector with df C and setting X := (FÉ (U1), ..., Fý (Ua)). We then 
verify, using Proposition A.3(iv), that 


PUGS x1, ..., Xa S xa) = PS, (U1) < x1, ..., F} (Ua) <S xa) 
= PU, < Fi (xı), ..., Ua S F(xa)) 
= C(Fi (x1), ..., Fa(xa)). 


Formulas (5.2) and (5.3) are fundamental in dealing with copulas. The former 
shows how joint distributions F are formed by coupling together marginal distribu- 
tions with copulas C; the latter shows how copulas are extracted from multivariate 
dfs with continuous margins. Moreover, (5.3) shows how copulas express depend- 
ence on a quantile scale, since the value C (u1, ..., uq) is the joint probability that 
X lies below its u1-quantile, X% lies below its u2-quantile, and so on. Sklar’s The- 
orem also suggests that, in the case of continuous margins, it is natural to define the 
notion of the copula of a distribution. 


Definition 5.4 (copula of F). If the random vector X has joint df F with contin- 
uous marginal distributions F},..., F4, then the copula of F (or X) is the df C of 
(Fi(X1),..-, Fa(Xa)). 


Discrete distributions. The copula concept is slightly less natural for multivariate 
discrete distributions. This is because there is more than one copula that can be used 
to join the margins to form the joint df, as the following example shows. 
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Example 5.5 (copulas of bivariate Bernoulli). Let (X1, X2) have a bivariate 
Bernoulli distribution satisfying 
P(X, =0,X.=0=3, P(Xi=1,X=1)= 
P(X; =0,X2.=1) =, P(X, =1,X2=0)= 
3 


Clearly, P(X; = 0) = P(X2 = 0) = g and the marginal distributions F; and F2 of 
X, and X2 are the same. From Sklar’s Theorem we know that 


IN olw 


P(X < x1, X2 < x2) = C(P (X1 < x1), P(X2 < x2)) 


for all x1, x2 and some copula C. Since Ran F; = Ran F> = {0, Š, 1}, clearly the 
only constraint on C is that C G, 3) = t. Any copula fulfilling this constraint is a 
copula of (X1, X2), and there are infinitely many such copulas. 


Invariance. A useful property of the copula of a distribution is its invariance under 
strictly increasing transformations of the marginals. In view of Sklar’s Theorem and 
this invariance property, we interpret the copula of a distribution as a very natural 
way of representing the dependence structure of that distribution, certainly in the 
case of continuous margins. 


Proposition 5.6. Let (X1,..., Xq) be a random vector with continuous mar- 
gins and copula C and let T\,..., T4 be strictly increasing functions. Then 
(Tı(Xı), ..., Ta(Xa)) also has copula C. 


Proof. First we show that the transformed variable T;(X;) has continuous df 
FiQ) := Fi o fom (y). To see this, observe that Proposition A.3(vii) implies 


FiO) = P(X < T (y)) = P(E o G(X) < T 6)). 


Since 7;,~ is an increasing (but not strictly increasing) transformation, we may use 
Lemma A.2(ii) to deduce 


FiO) = P(T(Xi) < y) + P(X; = T; 0), T(Xi) > y), 


but the second probability on the right-hand side is zero, since F; is continuous. 
Since C is the copula of X, we can now calculate that 


C(u,...,Un) = P(Fi(X1) <S u1, ..., Fa(Xa) < ua) 
= P(F\(T1(X1)) < u,..., Fa(Ta(Xa)) < ua), 


because F; o T;(x) = F; o T; o T; (x) = F; (x) by Proposition A.3(vii). It follows 
from Definition 5.4 that C is also the copula of (Tı (X1), ..., Ta(Xa)). 


Fréchet bounds. We close this section by establishing the important Fréchet 
bounds for copulas, which turn out to have important dependence interpretations 
that are discussed further in Sections 5.1.2 and 5.1.6. 


Theorem 5.7. For every copula C (u1, ..., uq) we have the bounds 


d 
max | Yous +1- d, 0} < Con) < mings, 1d (5.4) 


i=l 
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Proof. The second inequality follows from the fact that, for all i, 


N {U; < uj} C {U; < uj}. 
I<j<d 


For the first inequality observe that 


cw) = P( N Wi <u) =1- U (Ui > u)) 


I<i<d I<i<d 


d d 
>1- SO PU) >uj)=1-d+ ou. 
i=l i=l 


The lower and upper bounds will be given the notation W(u1,...,uq) and 
M (u1, ... , Ud), respectively. 


Remark 5.8. Although we give Fréchet bounds for a copula, Fréchet bounds may 
be given for any multivariate df. For a multivariate df F with margins F),..., Fa 
we establish by similar reasoning that 


d 
max | > Fj(x;) + 1—d, o) < F(x) < min{F (x1), ..., E(xa)}, (5.5) 
i=l 


so we have bounds for F in terms of its own marginal distributions. 


5.1.2 Examples of Copulas 


We provide a number of examples of copulas in this section and these are subdivided 
into three categories: fundamental copulas represent a number of important special 
dependence structures; implicit copulas are extracted from well-known multivariate 
distributions using Sklar’s Theorem, but do not necessarily possess simple closed- 
form expressions; explicit copulas have simple closed-form expressions and follow 
general mathematical constructions known to yield copulas. 


Fundamental copulas. The independence copula is 


d 
Muy,...,ua) = | [ ui. (5.6) 
i=l 


It is clear from Sklar’s Theorem, and equation (5.2) in particular, that rvs with 
continuous distributions are independent if and only if their dependence structure is 
given by (5.6). 

The comonotonicity copula is the Fréchet upper bound copula from (5.4): 


M(uy,...,uUg) = min{u1,..., ug}. (5.7) 


Observe that this special copula is the joint df of the random vector (U, ..., U), 
where U ~ U(0, 1). Suppose that the rvs X1,..., Xq have continuous dfs and 
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Figure 5.1. (a)-(c) Perspective plots and (d)-(f) contour plots of the three fundamental 
copulas: (a), (d) countermonotonicity, (b), (e) independence and (c), (f) comonotonicity. 
Note that these are plots of distribution functions. 


are perfectly positively dependent in the sense that they are almost surely strictly 
increasing functions of each other so that X; = 7;(X,) almost surely fori = 
2,...,d. As we have shown in the proof of Proposition 5.6, the df of X;,i > 2, is 
given by F; = Fi, o (hae and by Definition 5.4 the copula of (X1,..., Xa) is the df 
of 

(F(X), Fjo T o D(X1),..., Fi o Tý o Ta(X1)). 


Writing U = Fı(X1) and using Proposition A.3(vii), we see that this is the df of 
(U,..., U),1.e. the copula (5.7). The comonotonicity copula thus represents perfect 
dependence and we discuss this concept further in Section 5.1.6. 

The countermonotonicity copula is the two-dimensional Fréchet lower bound 
copula from (5.4) given by 


W (u1, u2) = max{uy + u2 — 1, 0}. (5.8) 


This copula is the joint df of the random vector (U, 1 — U), where U ~ U(O, 1). 
If X; and X3 have continuous dfs and are perfectly negatively dependent in the 
sense that X2 is almost surely a strictly decreasing function of X1, then (5.8) is their 
copula. We discuss perfect negative dependence in more detail in Section 5.1.6, 
where we see that an extension of the countermonotonicity concept to dimensions 
higher than two is not possible. 

Perspective pictures and contour plots for the three fundamental copulas are given 
in Figure 5.1. The Fréchet bounds (5.4) imply that all bivariate copulas lie between 
the surfaces in (a) and (c). 


Implicit copulas. If Y ~ Na(q, X) is a Gaussian random vector, then its copula is 
a so-called Gauss copula. Since the operation of standardizing the margins amounts 
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to applying a series of strictly increasing transformations, Proposition 5.6 implies 
that the copula of Y is exactly the same as the copula of X ~ Na(0, P), where 
P = (X) is the correlation matrix of Y. By Definition 5.4 this copula is given by 


CS (u) = P(®(X1) < wy,..., (Xa) < ua) 
= p(T! (u1), ..., ®'(ua)), (5.9) 


where ® denotes the standard univariate normal df and ®p denotes the joint df of 
X. The notation C oa emphasizes that the copula is parametrized by the 5d (d — 1) 
parameters of the correlation matrix; in two dimensions we write Ce where 
p = p(X, X2). 

The Gauss copula does not have a simple closed form, but can be expressed as an 
integral over the density of X; in two dimensions for |p| < 1 we have, using (5.9), 
that 


C8 (ui, u2) 


(uy) pou) 1 —(s? — 2ps1s2 + 83) 
= era Dif2 exp 1 7 ds; dsp. 
ee Ze: a(l — p?) (1 — p°) 


Note that both the independence and comonotonicity copulas are special cases of 
the Gauss copula. If P = I4, we obtain the independence copula (5.6); if P = Ja, the 
d x d matrix consisting entirely of ones, then we obtain comonotonicity (5.7). Also, 
for d = 2 and p = —1 the Gauss copula is equal to the countermonotonicity copula 
(5.8). Thus in two dimensions the Gauss copula can be thought of as a dependence 
structure that interpolates between perfect positive and negative dependence, where 
the parameter p represents the strength of dependence. 

Perspective plots and contour lines of the bivariate Gauss copula with p = 0.7 
are shown in Figure 5.2(a),(c); these may be compared with the contour lines of the 
independence and perfect dependence copulas in Figure 5.1. Note that these pictures 
show contour lines of distribution functions and not densities; a picture of the Gauss 
copula density is given in Figure 5.5. 

In the same way that we can extract a copula from the multivariate normal distri- 
bution, we can extract an implicit copula from any other distribution with continuous 
marginal dfs. For example, the d-dimensional t copula takes the form 


C! pu) = ty p(t, (u1), , t; (ua)), (5.10) 


where ż, is the df of a standard univariate f distribution, t, p is the joint df of the 
vector X ~ tqg(v, 0, P) and P is a correlation matrix. As in the case of the Gauss 
copula, if P = Jg then we obtain comonotonicity (5.8). However, in contrast to 
the Gauss copula, if P = Ig we do not obtain the independence copula (assuming 
v < œ) since uncorrelated multivariate t-distributed rvs are not independent (see 
Lemma 3.5). 


Explicit copulas. While the Gaussian and f copulas are copulas implied by well- 
known multivariate dfs and do not themselves have simple closed forms, we can 
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Figure 5.2. (a), (b) Perspective plots and (c), (d) contour plots of the Gaussian and Gumbel 
copulas, with parameters ọ = 0.7 and 0 = 2, respectively. Note that these are plots of 
distribution functions; a picture of the Gauss copula density is given in Figure 5.5. 


write down a number of copulas which do have simple closed forms. An example 
is the bivariate Gumbel copula: 


Co" (u1, u2) = exp{—((— Inu1)® + (—Inu2)*)'7},_ 1<@ <0. (5.11) 


If 6 = 1 we obtain the independence copula as a special case, and the limit of C Sia 
as 0 —> oo is the two-dimensional comonotonicity copula. Thus the Gumbel copula 
interpolates between independence and perfect dependence and the parameter 6 
represents the strength of dependence. Perspective plot and contour lines for the 
Gumbel copula with parameter 9 = 2 are shown in Figure 5.2(b),(d). They appear 
to be very similar to the picture for the Gauss copula, but Example 5.11 will show 
that the Gaussian and Gumbel dependence structures are quite different. 
A further example is the bivariate Clayton copula: 


CE, u2) = Uy? +43? — 71, 0 <6 <o. (5.12) 


In the limit as 6 — 0 we approach the independence copula, and as 0 — oo we 
approach the two-dimensional comonotonicity copula. 

The Gumbel and Clayton copulas belong to the Archimedean copula family and 
we provide more discussion of this family, including the issue of higher-dimensional 
extensions, in Section 5.4. 


5.1.3 Meta Distributions 


The converse statement of Sklar’s Theorem provides a very powerful technique 
for constructing multivariate distributions with arbitrary margins and copulas; we 
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know that if we start with a copula C and margins F),..., Fg, then F(x) := 
C(F\(*1),.-.-, Fa(%a)) defines a multivariate df with margins F\,..., Fa. 

Consider, for example, building a distribution with the Gauss copula C ga but 
arbitrary margins; such a model is known as a meta-Gaussian distribution. In the 
area of credit risk modelling an example is Li’s model (see Example 8.7), where the 
Gauss copula is used to join together exponential margins to obtain a model for the 
default times of companies when these default times are considered to be dependent. 

We extend the meta terminology to other distributions, so, for example, a meta-ty 
distribution has the copula C A p and arbitrary margins, and a meta-Clayton distri- 
bution has the Clayton copula and arbitrary margins. 


5.1.4 Simulation of Copulas and Meta Distributions 


It should be apparent from the way the implicit copulas in Section 5.1.2 were 
extracted from well-known distributions that it is particularly easy to sample from 
these copulas, provided we can sample from the distribution from which they are 
extracted. If we can generate a vector X with the df F, we can transform each 
component with its own marginal df to obtain a vector U = (U1,..., Uqa) = 
(F\(X1),..., Fa(Xa))’ with df C, the copula of F. Particular examples are given 
in the following algorithms. 


Algorithm 5.9 (simulation of Gauss copula). 
(1) Generate Z ~ Nq(0, P) using Algorithm 3.2. 


(2) Return U = (@(Z}),..., ®(Zz))’, where @ is the standard normal df. The 
random vector U has df C ie 


Algorithm 5.10 (simulation of t copula). 
(1) Generate X ~ tg(v, 0, P) using Algorithm 3.10. 


(2) Return U = (t,(X1),..., ty(Xq))’, where ty denotes the df of a standard 
univariate f¢ distribution. The random vector U has df C ! P- 


The Clayton and Gumbel copulas present slightly more challenging simulation 
problems and we give algorithms in Section 5.4 after looking at the structure of these 
copulas in more detail. These algorithms will, however, be used in Example 5.11 
below. 

Assume that the problem of generating realizations U from a particular copula has 
been solved. The converse of Sklar’s Theorem shows us how we can sample from 
interesting meta distributions that combine this copula with an arbitrary choice 
of marginal distribution. If U has df C, then we use quantile transformation to 
obtain X := (FÉ (U1), ..., F7 (Ua))’, which is a random vector with margins 
Fi, ..., Fa and multivariate df C (F1 (x1), . . . , Fa (xa)). This technique is extremely 
useful in Monte Carlo studies of risk and will be discussed further in the context of 
Example 5.56. 
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Figure 5.3. Two thousand simulated points from the (a) Gaussian, (b) Gumbel, 
(c) Clayton and (d) t copulas. See Example 5.11 for parameter choices and interpretation. 


Example 5.11 (various copulas compared). In Figure 5.3 we show 2000 simu- 
lated points from four copulas: the Gauss copula (5.9) with parameter p = 0.7; 
the Gumbel copula (5.11) with parameter 0 = 2; the Clayton copula (5.12) with 
parameter 0 = 2.2; the t copula (5.10) with parameters v = 4 and p = 0.71. 

In Figure 5.4 we transform these points componentwise using the quantile func- 
tion of the standard normal distribution to get realizations from four different meta 
distributions with standard normal margins. The Gaussian picture shows data gen- 
erated from a standard bivariate normal distribution with correlation 70%. The 
other pictures show data generated from unusual distributions that have been cre- 
ated using the converse of Sklar’s Theorem; the parameters of the copulas have 
been chosen so that all of these distributions have a linear correlation that is 
roughly 70%. 

Considering the Gumbel picture, these are bivariate data with a meta-Gumbel 
distribution with df CE(P(x1), @(x2)), where 0 = 2. The Gumbel copula causes 
this distribution to have upper tail dependence, a concept defined formally in Sec- 
tion 5.2.3. Roughly speaking, there is much more of a tendency for X2 to be extreme 
when Xj is extreme, and vice versa, a phenomenon which would obviously be wor- 
rying when X; and X3 are interpreted as potential financial losses. The Clayton 
copula turns out to have lower tail dependence, and the t copula to have both lower 
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Figure 5.4. Two thousand simulated points from four distributions with standard normal 
margins, constructed using the copula data from Figure 5.3 ((a) Gaussian, (b) Gumbel, 
(c) Clayton and (d) ft). The Gaussian picture shows points from a standard bivariate normal 
with correlation 70%; other pictures show distributions with non-Gauss copulas constructed 
to have a linear correlation of roughly 70%. See Example 5.11 for parameter choices and 
interpretation. 


and upper tail dependence; in contrast, the Gauss copula does not have tail depend- 
ence and this can also be glimpsed in Figure 5.2. In the upper-right-hand corner 
the contours of the Gauss copula are more like those of the independence copula of 
Figure 5.1 than the perfect dependence copula. 

Note that the qualitative differences between the distributions are explained by 
the copula alone; we can construct similar pictures where the marginal distributions 
are exponential or Student f, or any other univariate distribution. 


5.1.5 Further Properties of Copulas 


Survival copulas. A version of Sklar’s identity (5.2) also applies to multivariate sur- 
vival functions of distributions. Let X be a random vector with multivariate survival 
function F, marginal dfs F),..., Fg and marginal survival functions F ereis Fy, 
Le. F; = | — F;. We have the identity 


F(x1,...,Xa) = ÔF), ---, Fu(xa)) (5.13) 
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for a copula C , which is known as a survival copula. In the case when F),..., Fg 
are continuous this identity is easily established by noting that 
F(x,...,%¢@) = P(X, > x1, ..., X4 > xa) 
= P = Fi(X1) < F(a), ..-, 1 — Fa(Xa) < Fa@a)), 
so (5.13) follows by writing C for the distribution function of 1 — U, where U := 


(F(X), ..., Fa(Xa)). In general, the term survival copula of a copula C will be 
used to denote the df of 1 — U when U has df C. 


Example 5.12 (survival copula of a bivariate Pareto distribution). A well-known 
generalization of the important univariate Pareto distribution is the bivariate Pareto 
distribution with survivor function given by 


xı +k, x2 + k2 
+ 


= 
Pana = ( 1) > X1, X2 20, a, K1, K2 > Q. 


K1 K2 
It is easily confirmed that the marginal survivor functions are given by F; (x) = 
(ki/(ki + x))*,i = 1,2, and we then infer from (5.13) that the survival copula is 
given by Ĉ(u1, u2) = u + ay — 1)~*. Comparison with (5.12) reveals that 
this is the Clayton copula. 


The useful concept of radial symmetry can be expressed in terms of copulas and 
survival copulas. 


Definition 5.13 (radial symmetry). A random vector X (or its df) is radially sym- 
metric about a if X —a £ a—X. 


An elliptical random vector X ~ Eg(m, X, Y) is obviously radially symmetric 
about u. If U has df C, where C is a copula, then the only possible centre of 
symmetry is (0.5, ...,0.5), so C is radially symmetric if 


(Ui —0.5,..., Ua — 0.5) È (0.5 = ...,0.5- U) eS U 1-0. 


Thus if a copula C is radially symmetric and C is its survival copula, we have C=C. 
It is easily seen that the copulas of elliptical distributions are radially symmetric but 
the Gumbel and Clayton copulas are not. 

Survival copulas should not be confused with the survival functions of copulas, 
which are not themselves copulas. Since copulas are simply multivariate dfs, they 
have survival or tail functions, which we denote by C. If U has df C and the survival 
copula of C is Ĉĉ , then 


C(u1,... ua) = P(U, > u1, ..., Ud > ua) 
= P(1— U < l — uj,..., 1 — Uq S 1 — uq) 
= Ĉ(1 — u1, ..., 1 — uq). 


A useful relationship between a copula and its survival copula in the bivariate case 
is that 
C(1 — u1, | — u2) = l — u1 — u2 + C (u1, u2). (5.14) 
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Conditional distributions of copulas. It is often of interest to look at condi- 
tional distributions of copulas. We concentrate on two dimensions and suppose 
that (U1, U2) has df C. Since a copula is an increasing continuous function in each 
argument, 


Cu; + ô, u2) — C(uy, u2) 
ô 


Cuu, (u2 | u1) = P(U2 S u2 | Ui = u1) = var 


= Bike EA (5.15) 
ðu 1 

where this partial derivative exists almost everywhere (see Nelsen (1999) for precise 
details). The conditional distribution is a distribution on the interval [0, 1] which is 
only a uniform distribution in the case where C is the independence copula. A risk- 
management interpretation of the conditional distribution is the following. Suppose 
continuous risks (X1, X2) have the (unique) copula C. Then 1 — Cy,\y,(q¢ | p) is 
the probability that X2 exceeds its qth quantile given that X4 attains its pth quantile. 


Copula densities. Copulas do not always have joint densities; the comonotonicity 
and countermonotonicity copulas are examples of copulas that are not absolutely 
continuous. However, the parametric copulas that we have met so far do have den- 
sities given by 
dC (uy, e.’ uq) 
cui, ..., Ud) = ————.—_ (5.16) 
Ou, <- -dUd 
and we are sometimes required to calculate them, for example if we wish to fit 
copulas to data by maximum likelihood. 
It is useful to note that, for the implicit copula of an absolutely continuous joint df 


F with strictly increasing, continuous marginal dfs F}, .. . , Fg, we may differentiate 
C(u1,..., ud) = F(FÉ (u1), ..., F7 (ua)) to see that the copula density is given 
by 
F(FT' (u1), «+s Fy a) 
c(uj,...,Ud) = L a, (5.17) 
fiFi (1) +++ falFg (ua) 

where f is the joint density of F, fi,..., fg are the marginal densities, and 
Fy ee 2 1 l are the ordinary inverses of the marginal dfs. 


Using this technique we can calculate the densities of the Gaussian and t copulas 
as shown in Figures 5.5 and 5.6, respectively. Observe that the t copula assigns much 
more probability mass to the corners of the unit square; this may be explained by 
the tail dependence of the t copula, as discussed in Section 5.2.3. 


Exchangeability. 


Definition 5.14 (exchangeability). A random vector X is exchangeable if 


d 
(X1,..., Xa) = (Xr), ---, Xna) 


for any permutation (/7(1),..., T7(d)) of (1,..., d). 
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Figure 5.5. Perspective plot of the density of the bivariate 
Gauss copula with parameter p = 0.3. 


Figure 5.6. Perspective plot of the density of the bivariate 
t copula with parameters v = 4 and p = 0.3. 


We will refer to a copula as an exchangeable copula if it is the df of an exchangeable 
random vector of uniform variates U. Clearly, for such a copula we must have 


C(u, ..-, ud) = Cura), ---, UTA) (5.18) 


for all possible permutations of the arguments of C. Such copulas will prove useful 
in modelling the default dependence for homogeneous groups of companies in the 
context of credit risk. 

Examples of exchangeable copulas include both the Gumbel and Clayton copulas 
as well as the Gaussian and t copulas, C P and C 4 P in the case that P is an 
equicorrelation matrix, i.e. a matrix of the form P = pJg + (1 — p)Ia, where Ja is 
the square matrix consisting entirely of ones and p > —1/(d — 1). 

It follows from (5.18) and (5.15) that if the df of the vector (U1, U2) is an exchange- 
able bivariate copula, then 


P(U2 < u | Ui = u1) = P (U1 S u | U2 = u1), (5.19) 


which implies quite strong symmetry. If a random vector (X1, X2) has such a copula, 
then the probability that X2 exceeds its u2-quantile given that X; attains its u1- 
quantile is exactly the same as the probability that X; exceeds its u2-quantile given 
that X2 attains its w;-quantile. Not all bivariate copulas must satisfy (5.19). For an 
example of a non-exchangeable bivariate copula see Section 5.4.3 and Figure 5.13. 
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5.1.6 Perfect Dependence 


There are many equivalent ways of defining the concept of comonotonicity. We saw 
in Section 5.1.2 that continuously distributed rvs which are almost surely strictly 
increasing functions of one another have as copula the Fréchet upper bound. We will 
in fact use this copula to give a general definition of comonotonicity for any random 
vector (continuous margins or otherwise), and then look at an equivalent condition. 


Definition 5.15 (comonotonicity). The rvs X1, ..., Xq are said to be comonotonic 
if they admit as copula the Fréchet upper bound M(u1,..., ug) = min{u,,..., ug}. 


More insight into this definition is afforded by the following result, which essen- 
tially shows that comonotonic rvs are really only functions of a single rv. 


Proposition 5.16. X ,..., Xq are comonotonic if and only if 
d 
(X1,..., Xa) = (Ui(Z), ..-, va(Z)) (5.20) 
for some rv Z and increasing functions v1, ..., Vq. 
Proof. Assume that X1,..., Xq are comonotonic according to Definition 5.15. Let 
U be any uniform rv and write F, Fi, ..., Fg for the joint df and marginal dfs of 
X1,..., Xq, respectively. From (5.2) we have 


F(x1,..-, Xa) = min{F\(x1),..., Fa(xa)} 
= PU < min{Fi(x1),..., Fara) 
= PU S Fi(x1),...,U < Fa(xa)) 
= P(FÉ (U) < x1, ..., Fg (U) < xa) 
for any U ~ U (0, 1), where we use Proposition A.3(iv) in the last equality. It follows 


that 
(X1,...,Xa) È (FE (U), ..., E7 (U)), (5.21) 


which is of the form (5.20). Conversely, if (5.20) holds, then 
F(x1, ..., Xa) = P (v1 (Z) < x1, .-. , va(Z) S xa) = P(Z € Aj,...,Z € Aa), 


where each A; is an interval of the form (—ooọ, ki] or (—o, ki), so one interval A; 
is a subset of all other intervals. Therefore, 


F(x1,...,Xa) = min{P(Z € Aj),..., P(Z € Ag)} = min{ Fi (x1), ..., Fa(xa)}, 


which proves comonotonicity. 


In the case of rvs with continuous marginal distributions we have a simpler and 
stronger result. 


Corollary 5.17. Let X,,..., Xq be rvs with continuous dfs. They are comonotonic 
if and only if for every pair (i, j) we have Xj = Tj;(X;) almost surely for some 
increasing transformation T ;;. 
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Proof. The result follows from the proof of Proposition 5.16 by noting that the rv 
U may be taken to be F;(X;) for any i. Without loss of generality set d = 2 and 
i = | and use (5.21) and Proposition A.4 to obtain 


(X1, X2) É (FT o Fi (X1), FS o Fi(X1)) = (X1, FZ o F, (X1)). 


An important property of comonotonic risks is that their quantiles are additive 
and this is demonstrated in Proposition 6.15. 

In an analogous way to comonotonicity, we define countermonotonicity as a 
copula concept, albeit restricted to the case d = 2. 


Definition 5.18 (countermonotonicity). The rvs X; and X2 are countermonotonic 
if they have as copula the Fréchet lower bound W (u1, u2) = max{u; + u2 — 1, O}. 


Proposition 5.19. X; and X2 are countermonotonic if and only if 
d 
(X1, X2) = (v1 (Z), v2(Z)) 
for some rv Z with vı increasing and v2 decreasing, or vice versa. 


Proof. The proof is similar to that of Proposition 5.16 and is given in Embrechts, 
McNeil and Straumann (2002). 


Remark 5.20. In the case where X; and X> are continuous we have the simpler 
result that countermonotonicity is equivalent to X2 = T (X1) almost surely for some 
decreasing function T. 


The concept of countermonotonicity does not generalize to higher dimensions. 
The Fréchet lower bound W(u1,...,uq) is not itself a copula for d > 2 since it 
is not a proper distribution function and does not satisfy (5.1), as the following 
example taken from Nelsen (1999, Exercise 2.35) shows. 


Example 5.21 (the Fréchet lower bound is not a copula for d > 2). Consider 
the d-cube [1/2, 1]? c [0, 1]¢. If the Fréchet lower bound for copulas were a df 
on [0, 1], then (5.1) implies that the probability mass P(d) of this cube would be 
given by 


P(d) = max(1+---+1—d+1,0)—dmax(,+1+---+1-d+1,0) 
d 
+ (P) + fob 1 = d+ 1,0) = 
+max(5 +---+4-—d+1,0) 
1 
Hence the Fréchet lower bound cannot be a copula for d > 2. 


Some additional insight into the impossibility of countermonotonicity for dimen- 
sions higher than two is given by the following simple example. 


5.2. Dependence Measures 201 


Example 5.22. Let X; be a positive-valued rv and take X2 = 1/X and X3 = 
exp(—X 1). Clearly, (X1, X2) and (X1, X3) are countermonotonic random vectors. 
However, (X2, X3) is comonotonic and the copula of the vector (X1, X2, X3) is the 
df of the vector (U, 1 — U, 1 — U) which may be calculated to be 


C(u1, u2, u3) = max{min{u2, u3} + u; — 1, 0}. 


Notes and Comments 


Sklar’s Theorem is first found in Sklar (1959); see also Schweizer and Sklar (1983) 
for a proof of the result. A systematic development of the theory of copulas, par- 
ticularly bivariate ones, with many examples is found in Nelsen (1999). Pitfalls 
related to discontinuity of marginal distributions are presented in Marshall (1996). 
For extensive lists of parametric copula families see Hutchinson and Lai (1990), 
Joe (1997) and Nelsen (1999). A recent reference on copula methods in finance is 
Cherubini, Luciano and Vecchiato (2004). 

The concept of comonotonicity or perfect positive dependence is discussed by 
many authors, including Schmeidler (1986) and Yaari (1987). See also Wang and 
Dhaene (1998), whose proof we use in Proposition 5.16, and the entry in the Encyclo- 
pedia of Actuarial Science by Vyncke (2004). 


5.2 Dependence Measures 


In this section we focus on three kinds of dependence measure: the usual Pearson 
linear correlation; rank correlation; and the coefficients of tail dependence. All of 
these dependence measures yield a scalar measurement for a pair of rvs (X1, X2), 
although the nature and properties of the measure are different in each case. 

Correlation plays a central role in financial theory, but it is important to realize 
that the concept is only really a natural one in the context of multivariate normal 
or, more generally, elliptical models. As we have seen, elliptical distributions are 
fully described by a mean vector, a covariance matrix and a characteristic genera- 
tor function. Since means and variances are features of marginal distributions, the 
copulas of elliptical distributions can be thought of as depending only on the corre- 
lation matrix and characteristic generator; the correlation matrix thus has a natural 
parametric role in these models, which it does not have in more general multivariate 
models. Our discussion of correlation will focus on the shortcomings of correlation 
and the subtle pitfalls that the naive user of correlation may encounter when moving 
away from elliptical models. The concept of copulas will help us to illustrate these 
pitfalls. 

The other two kinds of dependence measure—rank correlations and tail-depen- 
dence coefficients—are copula-based dependence measures. In contrast to ordinary 
correlation, these measures are functions of the copula only and can thus be used in 
the parametrization of copulas, as will be seen. 


5.2.1 Linear Correlation 


The correlation o (X1, X2) between rvs X; and X2 was defined in (3.3). It is a mea- 
sure of linear dependence and takes values in [—1, 1]. If X; and X2 are independent, 
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then o(X1, X2) = 0, but it should be well known to all users of correlation that the 
converse is false: the uncorrelatedness of X; and X2 does not in general imply their 
independence. Examples are provided by the class of uncorrelated normal mixture 
distributions (see Lemma 3.5) and the class of spherical distributions (with the single 
exception of the multivariate normal). 

If |o(X1, X2)| = 1, then this is equivalent to saying that X2 and X are perfectly 
linearly dependent, meaning that X2 = a + 6X, almost surely for some a € R and 
B # 0, with £ > 0 for positive linear dependence and $ < O for negative linear 
dependence. Moreover, for 61, 62 > 0, 


pay + 61X1, &2 + B2X2) = p(X, X2), 


so correlation is invariant under strictly increasing linear transformations. How- 
ever, correlation is not invariant under nonlinear strictly increasing transformations 
T : R — R. For two real-valued rvs we have, in general, o(T (X1), T(X2)) Æ 
p(X, X2). 

Another obvious, but important, remark is that correlation is only defined when 
the variances of X; and X72 are finite. This restriction to finite-variance models is not 
ideal for a dependence measure and can cause problems when we work with heavy- 
tailed distributions. For example, actuaries who model losses in different business 
lines with infinite-variance distributions may not describe the dependence of their 
risks using correlation. We will encounter similar examples in Section 10.1.4 on 
operational risk. 


Correlation fallacies. We now discuss two further pitfalls in the use of correlation, 
which we present in the form of fallacies. We believe these fallacies are worth high- 
lighting because they illustrate the dangers of attempting to construct multivariate 
risk models starting from marginal distributions, and ideas about the correlation 
between risks. Both of the statements we make are true if we restrict our attention 
to elliptically distributed risk factors, but are false in general. A third fallacy con- 
cerning correlation and VaR is presented later, in Section 6.2.2. For background to 
these fallacies, alternative examples and a discussion of the relevance to multivariate 
Monte Carlo simulation, see Embrechts, McNeil and Straumann (2002). 


Fallacy 1. The marginal distributions and pairwise correlations of a random vector 
determine its joint distribution. 


It should already be clear to readers of this chapter that this is not true. Figure 5.4 
shows the key to constructing counterexamples. Suppose the rvs X; and X2 have 
continuous marginal distributions F; and F> and joint df C(F\ (x1), Fo(x2)) for 
some copula C and suppose their linear correlation is o(X1, X2) = p. It will gen- 
erally be possible to find an alternative copula C2 ¢ C and to construct a random 
vector (Y1, Y2) with df C2 (F1 (x1), F2(x2)) such that o(Y1, Y2) = p. The following 
example illustrates this idea in a case where p = 0. 
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Example 5.23. Consider two rvs representing profits and losses on two portfolios. 
Suppose we are given the information that both risks have standard normal distri- 
butions and that their correlation is zero. We construct two random vectors that are 
consistent with this information. 

Model 1 is the standard bivariate normal X ~ N2(0, 72). Model 2 is constructed by 
taking V to be an independent discrete rv such that P(V = 1) = P(V = —1) = 0.5 
and setting (Y1, Y2) = (X1, VX1) with X; as in model 1. This model obviously 
also has normal margins and correlation zero; its copula is given by 


C(u4, u2) = 0.5 max{u; + u2 — 1,0} + 0.5 min{u1, u2}, 


which is a mixture of the two-dimensional Fréchet-bound copulas. This could be 
roughly interpreted as representing two equiprobable states of the world: in one state 
financial outcomes in the two portfolios are comonotonic and we are certain to make 
money in both or lose money in both; in the other state they are countermonotonic 
and we will make money in one and lose money in the other. 

We can calculate analytically the distribution of the total losses X; + X; and 
Yı + Y2; the latter sum does not itself have a univariate normal distribution. For 
k > 0 we get that 


P(X, + X2>k)=B(k/V2), PUY + Yn >k) = O(GH), 
from which it follows that, for a > 0.75, 
FE,x,@)=V20@),  Fyyy,@) =207! 2 — 1). 


In Figure 5.7 we see that the quantile of Yı + Y2 dominates that of X; + X2 for 
probability levels above 93%. This example also illustrates that the VaR of a sum of 
risks is clearly not determined by marginal distributions and pairwise correlations. 
In Section 6.2 we will look at the problem of discovering how “bad” the quantile of 
the sum of two risks can be when the marginal distributions are known. 


The correlation of two risks does not only depend on their copula—if it did, then 
correlation would be invariant under strictly increasing transformations. Correlation 
is also inextricably linked to the marginal distributions of the risks and this imposes 
certain constraints on the values that correlation can take. This is the subject of the 
second fallacy. 


Fallacy 2. For given univariate distributions F; and F» and any correlation value p 
in [—1, 1] it is always possible to construct a joint distribution F with margins F 
and F> and correlation p. 


Again, this statement is true if F| and F> are the margins of an elliptical distribu- 
tion, but is in general false. The so-called attainable correlations can form a strict 
subset of the interval [—1, 1], as is shown in the next theorem. In the proof of the 
theorem we require the formula of Hoffding, which is given in the next lemma. 


Lemma 5.24. If (X1, X2) has joint df F and marginal dfs F; and F>, then the 
covariance of X, and X2, when finite, is given by 


cov(X;, X2) = f / (F'(x1, x2) — Fy (x1) Fo(x2)) dx dx2. (5.22) 
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Figure 5.7. VaR for the risks X1 + X2 and Yı + Y2 as described in Example 5.23. Both these 
pairs have standard normal margins and a correlation of zero; X; and X2 are independent, 
whereas Yı and Y> are dependent. 


Proof. Let (X1, X2) have df F and let (X1, X2) be an independent copy (i.e. a 
second pair with df F independent of (X1, X2)). We have 


2cov(X1, X2) = E((X1 — X1)(X2 — X2)). 


We now use a useful identity which says that, for any a € R and b € R, we can 
always write (a — b) = JS, (Iib<x} — Ita<x}) dx and apply this to the random pairs 
(Xı — X1) and (X2 — X2). We obtain 


2cov(X1, X2) 
CO [0,6] 
= F( f f (irn) = Taca ~ <a) dr dea) 
=00 J—C 


CO CO 
=2/ 1 (P(X1 < x1, X2 < x2) — P(X, < x1) P(X2 < x2)) dxı dx2. 
—CO —0O 


Theorem 5.25 (attainable correlations). Let (X1, X2) be a random vector with 
finite-variance marginal dfs F; and Fy and an unspecified joint df; assume also that 
var(X 1) > 0 and var(X2) > 0. The following statements hold. 


(1) The attainable correlations form a closed interval [Omin, Pmax] With Pmin < 
0 < pmax- 

(2) The minimum correlation p = Pmin 1s attained if and only if X and X2 are 
countermonotonic. The maximum correlation p = pmax 18 attained if and 
only if Xı and Xz are comonotonic. 
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(3) Pmin = —1 ifand only if X; and — X2 are of the same type (see Section A. 1.1), 
and pmax = | if and only if X, and X3 are of the same type. 


Proof. We begin with (2) and use the identity (5.22). We also recall the two- 
dimensional Fréchet bounds for a general df in (5.5): 


max{F} (x1) + F2(x2) — 1,0} < F (x1, x2) < min{ Fi (x1), Fo(x2)}. 


Clearly, when F; and F> are fixed, the integrand in (5.22) is maximized pointwise 
when X; and X2 have the Fréchet upper bound copula C (u1, u2) = min{u1, u2}, 
i.e. when they are comonotonic. Similarly, the integrand is minimized when X; and 
Xa are countermonotonic. 

To complete the proof of (1), note that clearly Pmax > 0. However, Pmax = 0 can 
be ruled out since this would imply that min{ Fj (x1), Fo(x2)} = Fi (41) Fo(x2) for 
all x1, x2. This can only occur if F; or Fz is a degenerate distribution consisting of 
point mass at a single point, but this is excluded by the assumption that variances 
are non-zero. By a similar argument we have that pmin < 0. If W(F\, F2) and 
M (F, F2) denote the Fréchet lower and upper bounds, respectively, then the mixture 
AW(F), Fo) + —A)M(F), F2),0 < A < 1, has correlation APmin + (1 — A) pmax- 
Thus for any p € [Pmin, Pmax] we can set A = (Pmax — P)/(Pmax — Pmin) to construct 
a joint df that attains the correlation value p. 

Part (3) is clear since Pmin = —1 Or pmax = 1 if and only if there is an almost 
sure linear relationship between X; and X2. 


Example 5.26 (attainable correlations for lognormal rvs). An example where the 
maximal and minimal correlations can be easily calculated occurs when In X; ~ 
N(0, 1) and In X2 ~ N(0, o°). For o Æ | the lognormally distributed rvs X; and 
X are not of the same type (although In X; and In X2 are) so that, by part (3) of 
Theorem 5.25, we have pmax < 1. The rvs X; and —X2 are also not of the same 
type, SO Pmin > —1. 

To calculate the actual boundaries of the attainable interval let Z ~ N (0, 1) and 
observe that if X; and X2 are comonotonic, then (X1, X2) £ (eZ A e72), Clearly, 
Pmax = p(e~ 3 e72 ) and, by a similar argument, pmin = p(e~ f e 0% ). The analytical 
calculation now follows easily and yields 


e 7-1 e7 —1 


Pmin = FS Pmax = -ee 
(e — 1)(e7° — 1) (e — 1)(e™ — 1) 

See Figure 5.8 for an illustration of the attainable correlation interval for different 
values of o and note how the boundaries of the interval both tend rapidly to zero as 
o is increased. This shows, for example, that we can have situations where comono- 
tonic rvs have very small correlation values. Since comonotonicity is the strongest 
form of positive dependence, this provides a correction to the widely held view that 
small correlations imply weak dependence. 


A common message can be extracted from both the fallacies of this section: 
namely that the concept of correlation is meaningless unless applied in the context 
of a well-defined joint model. Any interpretation of correlation values in the absence 
of such a model should be avoided. 
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Figure 5.8. Maximum and minimum attainable correlations for lognormal rvs X; and X2, 
where In(X 1) is standard normal and In(X) is normal with mean zero and variance o. 


5.2.2 Rank Correlation 


Rank correlations are simple scalar measures of dependence that depend only on 
the copula of a bivariate distribution and not on the marginal distributions, unlike 
linear correlation, which depends on both. The standard empirical estimators of rank 
correlation may be calculated by looking at the ranks of the data alone, hence the 
name. In other words, we only need to know the ordering of the sample for each 
variable of interest and not the actual numerical values. 

The main practical reason for looking at rank correlations is that they can be used to 
calibrate copulas to empirical data. At a theoretical level, being direct functionals of 
the copula, rank correlations have more appealing properties than linear correlations, 
as is discussed below. There are two main varieties of rank correlation, Kendall’s 
and Spearman’s, which we discuss in turn. 


Kendall’s tau. Kendall’s rank correlations can be understood as a measure of con- 
cordance for bivariate random vectors. Two points in R2, denoted by (x1, x2) and 
(X1, X2), are said to be concordant if (x; — X1)(x2 — X2) > 0 and to be discordant if 
(x1 — X1)(x2 — X2) < 0. Now consider a random vector (X1, X2) and an indepen- 
dent copy (X 1, X 2) (i.e. a second vector with the same distribution, but independent 
of the first). If X2 tends to increase with X1, then we expect the probability of con- 
cordance to be high relative to the probability of discordance; if X2 tends to decrease 
with increasing X1, then we expect the opposite. This motivates Kendall’s rank cor- 
relation, which is simply the probability of concordance minus the probability of 
discordance for these pairs: 


r(X1, X2) = P((X1—X1)(X2—X2) > 0)— P((X1—X1)(X2—X2) < 0). (5.23) 


It is easily seen that there is a more compact way of writing this as an expectation, 
which also leads to an obvious estimator in Section 5.5.1. 
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Definition 5.27. For rvs X; and X2 Kendall’s tau is given by 
pr(X1, X2) = E(sign((X1 — X1)(X2 — X2))), 
where (Xi, X2) is an independent copy of (X1, X2). 


In higher dimensions the Kendall ’s tau matrix of a random vector may be written 
as 0r(X) = cov(sign(X — X)), where X is an independent copy of X; since it can 
be expressed as a covariance matrix, 0; (X) is obviously positive semidefinite. 


Spearman’s rho. This measure can also be defined in terms of concordance and 
discordance for random pairs (see Kruskal 1958, p. 824) but the most intuitive 
definition for our purposes involves copulas. 


Definition 5.28. For rvs X; and X2 with marginal dfs F; and F2 Spearman’s rho 
is given by ps(X1, X2) = p(Fi(X1), F2(X2)). 


In other words, Spearman’s rho is simply the linear correlation of the probability- 
transformed rvs, which for continuous rvs is the linear correlation of their unique 
copula. The Spearman’s rho matrix for the general multivariate random vector X is 
given by ps(X) = po(Fı (X1), ..., Fa(Xq)) and must again be positive semidefinite. 


Properties of rank correlation. Kendall’s tau and Spearman’s rho have many prop- 
erties in common. They are both symmetric dependence measures taking values in 
the interval [—1, 1]. They give the value zero for independent rvs, although a rank 
correlation of 0 does not necessarily imply independence. It can be shown that they 
take the value 1 when X, and X2 are comonotonic (see Embrechts, McNeil and 
Straumann 2002) and the value —1 when they are countermonotonic (which con- 
trasts with the behaviour of linear correlation observed in Theorem 5.25). Now we 
will show that, for continuous marginal distributions, both rank correlations depend 
only on the unique copula of the risks and thus inherit its property of invariance 
under strictly increasing transformations. 

Proposition 5.29. Suppose X; and X have continuous marginal distributions and 
unique copula C. Then the rank correlations are given by 


1 pl 
pX Xa) =4 f i C (u1, u2)dC (u1, u2) — 1, (5.24) 
o Jo 


1 1 
E eae I / CU) = tas) Ga TA (5.25) 
0 0 


Proof. It follows easily from (5.23) that we can also write 
pr(X1, X2) = 2P((X1 — X1)(X2 — X2) > 0) — 1, 
and from the interchangeability of the pairs (X1, X2) and (X l X 2) we have 
pr(X1, X2) =4P(X1 < X1, X2 < X2) - 1 
= 4E(P(X, < X1, X2 < Xə | Ši, X2))-1 


œo po 
=4 f | P(X, < x1, X2 < x2)dF (x1, x2) — 1. (5.26) 
—co J —o0 
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Since X; and X2 have continuous margins, we infer that 


Pr (X1, X2) = f i C(F\ (x1), F2o(%2)) dC(F1 (x1), F2(x2)) — 1, 


from which (5.24) follows upon substituting uw; := F) (x1) and u2 := F2(x2). For 
Spearman’s rho observe that pg(X1, X2) = 12 cov(Fı (X1), F2(X2)), since Fj (Xj) 


has a uniform distribution with variance E Formula (5.25) follows upon applying 


Hoffding’s formula (5.22). 


To what extent do the two fallacies of linear correlation identified in Section 5.2.1 
carry over to rank correlation? Clearly, Fallacy 1 remains relevant: marginal distri- 
butions and pairwise rank correlations do not fully determine the joint distribution 
of a vector of risks. However, Fallacy 2 is essentially taken out of play when we 
consider rank correlations: for any choice of continuous marginal distributions it 
is possible to specify a bivariate distribution that has any desired rank correlation 
value in [—1, 1]. One way of doing this is to take a convex combination of the form 


F (x1, x2) = AW (Fi (x1), Fo(x2)) + A — AM (Fi (x1), F2(x2)), 


where W and M are the countermonotonicity and comonotonicity copulas, respec- 
tively. A random pair (X1, X2) with this df has rank correlation 


P:(X1, X2) = ps(X1, X2) = (1 — 2a), 


which yields any desired value in [—1, 1] for an appropriate choice of A in [0, 1]. But 
this is only one of many possible constructions; a model with the Gauss copula of the 
form F (x1, x2) = C al Ou 1(x1), F2(x2)) can also be parametrized by an appropriate 
choice of p € [—1, 1] to have any rank correlation in [—1, 1]. In Section 5.3.2 
we will explicitly calculate Spearman’s rank correlation coefficients for the Gauss 
copula, and Kendall’s tau values for the Gauss copula and other copulas of normal 
variance mixture distributions. 


5.2.3 Coefficients of Tail Dependence 


Like the rank correlations, the coefficients of tail dependence are measures of pair- 
wise dependence that depend only on the copula of a pair of rvs X; and X2 with 
continuous marginal dfs. The motivation for looking at these coefficients is that they 
provide measures of extremal dependence or, in other words, measures of the strength 
of dependence in the tails of a bivariate distribution. The coefficients we describe 
are defined in terms of limiting conditional probabilities of quantile exceedances. 
We note that there are a number of other definitions of tail-dependence measures in 
the literature (see Notes and Comments). 

In the case of upper tail dependence we look at the probability that X2 exceeds 
its g-quantile, given that Xı exceeds its g-quantile, and then consider the limit as 
q goes to infinity. Obviously the roles of X; and X% are interchangeable. Formally 
we have the following. 
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Definition 5.30. Let X; and X2 be rvs with dfs F and F2. The coefficient of upper 
tail dependence of X; and X3 is 


Au := Au(X1, X2) = ion: P(X2 > Fy (4) | X1 > Fy @)), 
q> 


provided a limit Ay € [0, 1] exists. If àu € (0, 1], then X; and X2 are said to show 
upper tail dependence or extremal dependence in the upper tail; if àu = 0, they are 
asymptotically independent in the upper tail. Analogously, the coefficient of lower 
tail dependence is 


Ay = A(X], X2) = lim, P(X2 < Fy (q) | Xı < FÉ (q)), 
ger 


provided a limit A; € [0, 1] exists. 


If Fı and F are continuous dfs, then we get simple expressions for A; and Ay 
in terms of the unique copula C of the bivariate distribution. Using elementary 
conditional probability and (5.3) we have 


P(X2 < Fy (4), Xı <S Fy @) 


à= lim 
q>0+ P(X1 < FÉ) 
C ’ 
=i gia Ma) (5.27) 
q>0r q 
For upper tail dependence we use (5.13) to obtain 
C(l—q,1- Ca, 
cs di ee, Bd) (5.28) 
q>17 1— q q—>0+ q4 


where Ĉ is the survival copula of C (see (5.14)). For radially symmetric copulas we 
must have à] = Ay, since C = Ĉ for such copulas. 

Calculation of these coefficients is straightforward if the copula in question has a 
simple closed form, as is the case for the Gumbel copula in (5.11) and the Clayton 
copula in (5.12). In Section 5.3.1 we will use a slightly more involved method 
to calculate tail-dependence coefficients for copulas of normal variance mixture 
distributions, such as the Gaussian and ¢ copulas. 


Example 5.31 (Gumbel and Clayton copulas). Writing om for the Gumbel sur- 
vival copula we first use (5.14) to infer that 


CS (1 —g,1- C(q,q)— 1 
Ges chee OEE a OE, 
q>17 _ q q>17 q— 1 
We now use L’ Hôpital’s rule and the fact that Ce (u, u) = u?” to infer that 
dac®&! i 
ee e A i, 
q>17 dq 


Provided that 0 > 1, the Gumbel copula has upper tail dependence. The strength of 
this tail dependence tends to 1 as 9 —> oœ, which is to be expected since the Gumbel 
copula tends to the comonotonicity copula as 0 —> oo. Using a similar technique 
the coefficient of lower tail dependence for the Clayton copula may be shown to be 
dy = 27? for 6 > 0. 
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The consequences of the lower tail dependence of the Clayton copula and the 
upper tail dependence of the Gumbel copula can be seen in Figures 5.3 and 5.4, 
where there is obviously an increased tendency for these copulas to generate joint 
extreme values in the respective corners. In Section 5.3.1 we will see that the Gauss 
copula is asymptotically independent in both tails, while the t copula has both upper 
and lower tail dependence of the same magnitude (due to its radial symmetry). 


Notes and Comments 


The discussion of correlation fallacies is based on Embrechts, McNeil and Strau- 
mann (2002), which contains a number of other examples illustrating these pitfalls. 
For Hoffding’s formula and its use in proving the bounds on attainable correlations 
see Hoffding (1940), Fréchet (1951) and Shea (1983). 

Useful references for rank correlations are Kruskal (1958) and Joag-Dev (1984). 
The relationship between rank correlation and copulas is discussed in Schweizer 
and Wolff (1981) and Nelsen (1999). The definition of tail dependence that we use 
stems from Joe (1993, 1997). There are a number of alternative definitions of tail- 
dependence measures, as discussed, for example, in Coles, Heffernan and Tawn 
(1999). 


5.3 Normal Mixture Copulas 


A unique copula is contained in every multivariate distribution with continuous 
marginal distributions, and a useful class of parametric copulas are those contained 
in the multivariate normal mixture distributions of Section 3.2. We view these cop- 
ulas as particularly important in market risk applications; indeed, in most cases, 
these copulas are used implicitly, without the user necessarily recognizing the fact. 
Whenever normal mixture distributions are fitted to multivariate return data or used 
as innovation distributions in multivariate time series models, normal mixture cop- 
ulas are used. They are also found in a number of credit risk models, both implicitly 
and explicitly; an example is Li’s model in Example 8.7. 

In this section we first focus on normal variance mixture copulas; in Section 5.3.1 
we examine their tail-dependence properties; and in Section 5.3.2 we calculate rank 
correlation coefficients, which are useful for calibrating these copulas to data. Then, 
in Sections 5.3.3 and 5.3.4, we look at more exotic examples of copulas arising from 
multivariate normal mixture constructions. 


5.3.1 Tail Dependence 


Coefficients of tail dependence. Consider a pair of uniform rvs (U;, U2) whose 
distribution C (u1, u2) is a normal variance mixture copula. Due to the radial sym- 
metry of C (see Section 5.1.5), it suffices to consider the formula for the lower 
tail-dependence coefficient in (5.27) to calculate the coefficient of tail dependence 
à of C. By applying L’ H6pital’s rule and using (5.15) we obtain 


dc(q, 
A SN) 


= lim P(U2 <q | Ui =q)+ lim, PU1 <q | U2 =q). 
q—>0+ dq q—>0+ q—>0*t 
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Since C is exchangeable we have from (5.19) that 


A=2 lim P(U <q | U1 = q). (5.29) 
q—>0+ 


We now show the interesting contrast between the Gaussian and ¢ copulas that we 
alluded to in Example 5.11, namely that the t copula has tail dependence, whereas 
the Gauss copula is asymptotically independent in the tail. 


Example 5.32 (asymptotic independence of the Gauss copula). To evalu- 
ate the tail-dependence coefficient for the Gauss copula Gr let (X1, X2) := 
(@—!(U,), ®-!(U>)), so that (X1, X2) has a bivariate normal distribution with 
standard margins and correlation p. It follows from (5.29) that 


A=2 lim, P(®'(U2) < P7) | PU) = PA) 
qa 
=2 lim P(X) <x |X, =x). 
X——0OO 


Using the fact that X2 | Xı = x ~ N(px, 1 — p°), it can be calculated that 
A=2 lim @x/1- p/V14+p) =0, 
x7>—-CO 


provided p < 1. Hence, the Gauss copula is asymptotically independent in both 
tails. Regardless of how high a correlation we choose, if we go far enough into the 
tail, extreme events appear to occur independently in each margin. 


Example 5.33 (asymptotic dependence of the ¢ copula). To evaluate the tail- 
dependence coefficient for the t copula Cip let (X1, X2) := (t7! (U1), a (U2)), 
where t, denotes the df of a univariate t distribution with v degrees of freedom. Thus 
(X1, X2) ~ f2(v, 0, P), where P is a correlation matrix with off-diagonal element p. 
By calculating the conditional density from the joint and marginal densities of a 
bivariate ¢ distribution, it may be verified that, conditional on X; = x, 


1 1/2 %, _ 
(= ) 2T Pe (5.30) 


~t : 
2 v+1 
v+x /1— p? 


Using an argument similar to Example 5.32 we find that 


pS | pus tee 2), (5.31) 


Provided that ọ > —1, the copula of the bivariate ¢ distribution is asymptotically 
dependent in both the upper and lower tail. 

In Table 5.1 we tabulate the coefficient of tail dependence for various values of 
v and p. For fixed p the strength of the tail dependence increases as v decreases 
and for fixed v tail dependence increases as p increases. Even for zero or negative 
correlation values there is some tail dependence. This is not too surprising and can 
be grasped intuitively by recalling from Section 3.2.1 that the f distribution is a 
normal mixture distribution with a mixing variable W whose distribution is inverse 
gamma (which is a heavy-tailed distribution): if |X;| is large, there is a good chance 
that this is because W is large, increasing the probability of | X2] being large. 
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Table 5.1. Values of 4, the coefficient of upper and lower tail dependence, for the t copula 
C a p for various values of v, the degrees of freedom, and p, the correlation. The last row 
represents the Gauss copula. 


2 0.06 0.18 0.39 0.72 

4 001 0.08 0.25 0.63 
10 0.00 0.01 0.08 0.46 
oo 0 0 0 0 


SS Se 


We could use the same method used in the previous examples to calculate tail- 
dependence coefficients for other copulas of normal variance mixtures. In doing so 
we would find that most examples, such as copulas of symmetric hyperbolic or NIG 
distributions, fell into the same category as the Gauss copula and were asymptotically 
independent in the tails. The essential determinant of whether the copula of a normal 
variance mixture has tail dependence or not is the tail of the distribution of the mixing 
variable W in Definition 3.4. If W has a distribution with a power tail, then we get 
tail dependence, otherwise we get asymptotic independence. This is a consequence 
of a general result for elliptical distributions given in Section 7.3.3. 


Joint quantile exceedance probabilities. Coefficients of tail dependence are of 
course asymptotic quantities, and in the remainder of this section we look at joint 
exceedances of finite high quantiles for the Gauss and t copulas in order to learn more 
about the practical consequences of the differences between the extremal behaviours 
of these two models. 

As motivation we consider Figure 5.9, where 5000 simulated points from four dif- 
ferent distributions are displayed. The distributions in (a) and (b) are meta-Gaussian 
distributions (see Section 5.1.3); they share the same copula Cy The distribu- 
tions in (c) and (d) are meta-t distributions; they share the same copula Ci. p: The 
values of v and p in all parts are 4 and 0.5, respectively. The distributions in (a) 
and (c) share the same margins, namely standard normal margins. The distribu- 
tions in (b) and (d) both have Student t margins with four degrees of freedom. 
The distributions in (a) and (d) are, of course, elliptical, being a standard bivari- 
ate normal and a bivariate ¢ distribution with four degrees of freedom; they both 
have linear correlation po = 0.5. The other distributions are not elliptical and do 
not necessarily have linear correlation 50%, since altering the margins alters the 
linear correlation. All four distributions have identical Kendall’s tau values (see 
Proposition 5.37). The meta-Gaussian distributions have the same Spearman’s rho 
value, as do the meta-t distributions, although the two values are not identical (see 
Section 5.3.2). 

The vertical and horizontal lines mark the true theoretical 0.005 and 0.995 quan- 
tiles for all distributions. Note that for the meta-t distributions the number of points 
that lie below both 0.005 quantiles or exceed both 0.995 quantiles is clearly greater 
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Figure 5.9. Five thousand simulated points from four distributions. (a) Standard bivariate 
normal with correlation parameter p = 0.5. (b) Meta-Gaussian distribution with copula C a 
and Student ż margins with four degrees of freedom. (c) Meta-t distribution with copula 
Ch, 2 and standard normal margins. (d) Standard bivariate ¢ distribution with four degrees of 
freedom and correlation parameter p = 0.5. Horizontal and vertical lines mark the 0.005 and 
0.995 quantiles. See Section 5.3.1 for a commentary. 


than for the meta-Gaussian distributions, and this can be explained by the tail depend- 
ence of the t copula. The true theoretical ratio by which the number of these joint 
exceedances in the meta-t models should exceed the number in the meta-Gaussian 
models is 2.79, as may be read from Table 5.2, whose interpretation we now discuss. 

In Table 5.2 we have calculated values of Cr, u)/Ci, (u, u) for various p 
and v and u = 0.05, 0.01, 0.005, 0.001. The rows marked Gauss contain values of 
Cru; u), which is the probability that two rvs with this copula are below their 
u-quantiles; we term this event a joint quantile exceedance (thinking of exceedance 
in the downwards direction). Obviously it is identical to the probability that both rvs 
are larger than their (1 — u)-quantiles. The remaining rows give the values of the ratio 
and thus express the amount by which the joint quantile exceedance probabilities 
must be inflated when we move from models with a Gauss copula to models with a 
t copula. 

In Table 5.3 we extend Table 5.2 to higher dimensions. We now focus only on 
joint exceedances of the 1% (or 99%) quantile(s). We tabulate values of the ratio 
CB (u, pera u)/C), plu, ...,U), Where P is an equicorrelation matrix with all cor- 
relations equal to p. It is noticeable that not only do these values grow as the corre- 
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Table 5.2. Joint quantile exceedance probabilities for bivariate Gauss and ft copulas with 
correlation parameter values of 0.5 and 0.7. For Gauss copulas the probability of joint quantile 
exceedance is given; for the t copulas the factors by which the Gaussian probability must be 
multiplied are given. 


Quantile 

— .COCO0”00.7.8.080880— ooo 
p  Copua v 0.05 0.01 0.005 0.001 
0.5 Gauss 1.21 x 1072? 1.29x 1073 4.96 x 1074 5.42 x 1079 
0.5 t 8 1.20 1.65 1.94 3.01 
0.5 t 4 1.39 2.22 2.79 4.86 
0.5 t 3 1.50 2.55 3.26 5.83 
0.7 Gauss 1.95 x 1072 2.67x 1073 1.14x 1073 1.60 x 1074 
0.7 t 8 1.11 1.33 1.46 1.86 
0.7 t 4 1.21 1.60 1.82 2.52 
0.7 t 3 127 1.74 2.01 2.83 


Table 5.3. Joint 1% quantile exceedance probabilities for multivariate Gaussian and t 
equicorrelation copulas with correlation parameter values of 0.5 and 0.7. For Gauss cop- 
ulas the probability of joint quantile exceedance is given; for the t copulas the factors by 
which the Gaussian probability must be multiplied are given. 


Dimension d 
a a a 


p  Copua v 2 3 4 5 

0.5 Gauss 1.29x 1073 3.66x 1074 1.49x 1074 7.48 x 1075 
0.5 t 8 1.65 2.36 3.09 3.82 
0.5 t 4 2.22 3.82 5.66 7.68 
0.5 t 3 2.55 4.72 7.35 10.34 
0.7 Gauss 2.67 x 107? 1.28x 1073 7.77x 1074 5.35 x 1074 
0.7 t 8 1.33 1.58 1.78 1.95 
0.7 t 4 1.60 2.10 2.53 2.91 
0.7 t 3 1.74 2.39 2.97 3.45 


lation parameter or number of degrees of freedom falls, but they also grow with the 
dimension of the copula. The next example gives an interpretation of one of these 
numbers. 


Example 5.34 (joint quantile exceedances: an interpretation). Consider daily 
returns on five stocks. Suppose we are unsure about the best multivariate elliptical 
model for these data returns, but we believe that the correlation between any two 
returns on the same day is 50%. If returns follow a multivariate Gaussian distribu- 
tion, then the probability that on any day all returns are below the 1% quantiles of 
their respective distributions is 7.48 x 1075. In the long run such an event will hap- 
pen once every 13 369 trading days on average, that is roughly once every 51.4 years 
(assuming 260 trading days in a year). On the other hand, if returns follow a multi- 
variate ¢ distribution with four degrees of freedom, then such an event will happen 
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7.68 times more often, that is roughly once every 6.7 years. In the life of a risk 
manager, 50-year events and 7-year events have a very different significance. 


5.3.2 Rank Correlations 


To calculate rank correlations for normal variance mixture copulas we use the fol- 
lowing preliminary result for elliptical distributions. 


Proposition 5.35. Let X ~ E2(0, X, Y) and p = £(2)12, where go denotes the 
correlation operator in (3.5). Assume P(X = 0) = 0. Then 


arcsin p 


P(X, >0,X2>0)=}+ = 


Proof. First we make a standardization of the variables and observe that if Y ~ 
E2(0, P, Y) and P = (X), then P(X; > 0, X2 > 0) = P(Y > 0,2 > 0). 
Now introduce a pair of spherical variates Z ~ S2 (W); it follows that 


d 
(Y1, Yo) = (Z1, pZ1 + V1 — p? Z2) 


4 R(cos O, pcos © + y1 — p? sin O), 


where R is a positive radial rv and @ is an independent, uniformly distributed angle 
on [—7, 7 ) (see Section 3.3.1 and Theorem 3.22). Let ø = arcsin p and observe that 
sing = p and cos ġ = y 1 — p?. Since P(R = 0) = P(X = 0) = 0 we conclude 
that 


P(X, > 0, X2 > 0) = P (cos O > 0, sin ġ cos © + cos ġ sin O > 0) 
= P (cos © > 0, sin(© + ¢) > 0). 
The angle © must jointly satisfy © € —ix, 51) and © + @ e (0,7) and it is 


easily seen that for any value of ¢ this has probability Gr + ¢)/(27), which gives 
the result. 


Theorem 5.36 (rank correlations for Gauss copula). Let X have a bivariate meta- 
Gaussian distribution with copula cy and continuous margins. Then the rank cor- 
relations are 


2 

Pr(X1, X2) = — arcsin p, (5.32) 
T 
6 

ps(X1, X2) = — arcsin 50. (5.33) 
T 


Proof. Since rank correlation is a copula property we can of course simply assume 
that X ~ N2(0, P), where P is a correlation matrix with off-diagonal element p; 
the calculations are then easy. For Kendall’s tau, formula (5.26) implies 


Pr(X1, X2) = 4P (Y1 > 0, Y2 > 0) — 1, 


where Y = X — X and X is an independent copy of X. Since Y ~ N2(0,2P), 
by the convolution property of multivariate normal in Section 3.1.3, we have that 
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Figure 5.10. The solid line shows the relationship between Spearman’s rho and the corre- 
lation parameter p of the Gauss copula C4 for meta-Gaussian rvs with continuous dfs; this 
is very close to the line y = x, which is just visible as a dotted line. The dashed line shows 
the relationship between Kendall’s tau and p; this relationship holds for the copulas of other 
normal variance mixture distributions with correlation parameter p, such as the t copula C i p 


p(Yı, Y2) = p and formula (5.32) follows from Proposition 5.35. For Spearman’s 
rho we observe that (5.25) implies 
1 pl 
SXi X) =12 | f PÆ) <u, PA) < ua) dui dug — 3 
0 JO 


1 1 
= a(a f f P(X) < D7! (u1), X2 < D7! (u2)) duy duz — i) 
0 J0 


=s(4 f I P(X, < x1, X2 < x1)ġ (x1) x2) dxı dxa—1), 


where x; := p! (ui) and @ is the standard normal density. Now let Zı and Z2 
denote two standard normal variates, independent of X and of each other. We see 
that 


ps(X1, X2) = 3(4E(P (X1 < Zi, X2 < Z2 | Zi, Z2)) — 1) 
= 3(4P (X1 <= Zi, X2 < Z2) = 1) 
= 3(4P(Y, > 0, Yo > 0) — 1), 


where Y = Z — X. Since Y ~ N2(0, (P + h)), the formula (5.33) follows from 
Proposition 5.35. 


These relationships between the rank correlations and p are illustrated in Fig- 
ure 5.10. Note that the right-hand side of (5.33) may be approximated by the value 


p itself. This approximation turns out to be very accurate, as shown in the figure; 
the error bounds are |6 arcsin(p/2)/m — p| < (x — 3)|p|/m < 0.0181. 
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The relationship between Kendall’s tau and the correlation parameter of the Gauss 
copula ce expressed by (5.32) holds more generally for the copulas of essentially 
all normal variance mixture distributions, such as the t copula C A p: This is implied 
by the following general result for elliptical distributions, which was used to derive 
an alternative correlation estimator for bivariate distributions in Section 3.3.4. 


Proposition 5.37. Let X ~ E2(0, P, Y) fora correlation matrix P with off-diagonal 
element p, and assume that P(X = 0) = 0. Then the relationship p: (X1, X2) = 
(2/7) arcsin p holds. 


Proof. The result relies on the convolution property of elliptical distributions 
in (3.54). Setting Y = X — X, where X is an independent copy of X, we note 
that Y ~ E2(0,P, Y) for some characteristic generator Y. We need to evaluate 
P:(X1, X2) = 4P(% > 0, Y2 > 0) — l as in the proof of Theorem 5.36, but 
Proposition 5.35 shows that P(Y; > 0, Y2 > 0) takes the same value whenever Y 
is elliptical. 


Remark 5.38. The relationship (5.33) between Spearman’s rho and linear corre- 
lation does not hold for all elliptical distributions. A counterexample is found in 
Hult and Lindskog (2002). Simple formulas for elliptical distributions other than 
the Gaussian, such as the multivariate t, are not known to us. 


5.3.3. Skewed Normal Mixture Copulas 


A skewed normal mixture copula is the copula of any normal mixture distribution 
that is not elliptically symmetric. An example is provided by the skewed t copula, 
which is the copula of the distribution whose density is given in (3.32). 

A random vector X with a skewed ¢ distribution and v degrees of freedom is 
denoted X ~ GH4 (-5 5V, v, 0, u, X, y) inthe notation of Section 3.2.3. Its marginal 
distributions satisfy X; ~ GH: (—4v, v, 0, Wi, Xii, yi) (from Proposition 3.13) and 
its copula depends on v, P = p (X) and y and will be denoted by C! v, P,y O in the 
bivariate case, C! Random sampling from the skewed t copula follows the 


v, p, y1, V2" 
same approach as for the ¢ copula in Algorithm 5.10. 


Algorithm 5.39 (simulation of skewed ¢ copula). 


(1) Generate X ~ GH4(—4v, v, 0,0, P, y) using Algorithm 3.10. 


(2) Return U = (Fı(X1), ..., Fa(Xq))’, where F; is the distribution function of 


a GH (4v, v, 0, 0, 1, pi distribution. The random vector U has df C‘ Pe 


Note that the evaluation of F; requires the numerical integration of the density of a 
skewed univariate ¢ density. 


To appreciate the flexibility of the skewed ¢ copula it suffices to consider the 
bivariate case for different values of the skewness parameters yı and y2. In Fig- 
ure 5.11 we have plotted simulated points from nine different examples of this 
copula. Part (e) corresponds to the case when yı = y2 = 0 and is thus the ordinary 
t copula. All other pictures show copulas which are non-radially symmetric (see 
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Figure 5.11. Ten thousand simulated points from bivariate skewed t copula ci, Ovi 
for v = 5, p = 0.8 and various values of the parameters (y1, y2): (a) y = (0.8, —0.8); 
(b) y = (0.8, 0); (©) y = (0.8, 0.8); (d) y = (0, —0.8); (e) y = (0,0); D y = (0, 0.8); 
(g) y = (—0.8, —0.8); (h) y = (—0.8, 0); and (i) y = (—0.8, 0.8). 


Section 5.1.5), as is obvious by rotating each picture 180° about the point Gi 5); 
(c), (e) and (g) show exchangeable copulas satisfying (5.18), while the remaining 
six are non-exchangeable. 

Obviously the main advantage of the skewed f copula over the ordinary t copula is 
that its asymmetry allows us to have different levels of tail dependence in “opposite 
corners” of the distribution. In the context of market risk it is often claimed that joint 
negative returns on stocks show more tail dependence than joint positive returns. 


5.3.4 Grouped Normal Mixture Copulas 


Technically speaking, a grouped normal mixture copula is not itself the copula of a 
normal mixture distribution, but rather a way of attaching together a set of normal 
mixture copulas. We will illustrate the idea by considering the grouped t copula. 
Here, the basic idea is to construct a copula for a random vector X such that certain 
subvectors of X have t copulas but quite different levels of tail dependence. 
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We create a distribution using a generalization of the variance-mixing construc- 
tion X = /WZ in (3.19). Rather than multiplying all components of a correlated 
Gaussian vector Z with the root of a single inverse-gamma-distributed variate W, 
as in Example 3.7, we instead multiply different subgroups with different variates 
Wj, where Wj ~ Ig(5v Ja Jv j) and the W; are themselves comonotonic (see Sec- 
tion 5.1.6). Thus we create subgroups whose dependence properties are described 
by t copulas with different v; parameters. 

Like the ¢ copula, the skewed ¢ copula and anything based on a mixture of multi- 
variate normals, a grouped ¢ copula is easy to simulate and thus to use in Monte 
Carlo risk studies—this has been a major motivation for its development. We for- 
mally define the grouped t copula by explaining in more detail how to generate a 
random vector U with that distribution. 


Algorithm 5.40 (simulation of grouped ¢ copula). 
(1) Generate independently Z ~ Ng(0, P) and U ~ U(0, 1). 


(2) Partition {1,...,d} into m subsets of sizes s1,..., Sm, and fork =1,...,m 
let vg be the degrees-of-freedom parameter associated with group k. 


(3) Set Wg = G3 (U), where G, is the df of the univariate Ig(4v, sv) distri- 
bution, so that W,..., Wm are comonotonic and inverse-gamma-distributed 
variates. 


(4) Construct vectors X and U by 


X =(VW1iZ1,...,V WiZs,, V WoZs,41, <- V W2Zsi+s25 <- -o Y WnZa); 
U = (ty, (X1), tees ty (Xs), tv (Xs,41), e.’ by (Xsis); e.r’ tom (Xa). 


The former has a grouped ¢ distribution and the latter is distributed according 
to a grouped f copula. 


If we have an a priori idea of the desired group structure, we can calibrate the 
grouped ¢ copula to data using a method based on Kendall’s tau rank correlations. 
The use of this method for the ordinary f¢ copula is described later in Section 5.5.1 
and Example 5.54. 


Notes and Comments 


The coefficient of tail dependence for the t copula was first derived in Embrechts, 
McNeil and Straumann (2002). A more general result for the copulas of elliptical 
distributions is given in Hult and Lindskog (2002) and will be discussed in Sec- 
tion 7.3.3. The formula for Kendall’s tau for elliptical distributions can be found in 
Lindskog, McNeil and Schmock (2003) and Fang and Fang (2002). 

The skewed ¢ copula was introduced in Demarta and McNeil (2005), which also 
describes the grouped t copula. The grouped żt copula and a method for its calibration 
was first proposed in Daul et al. (2003). 
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Table 5.4. Table summarizing the generators, permissible parameter values and limiting 
special cases for four selected Archimedean copulas. The case 0 = 0 should be taken to mean 
the limit limg_,9 ¢9 (t). For the Clayton and Frank copulas this limit is — Int, which is the 
generator of the independence copula. 


Copula Generator ¢ (t) Parameter range Strict Lower Upper 
cr (— Int)? 6>1 Yes TT M 
1 
cs! ESED 0 > -1 6>0 W M 
ef] 
ch = In ( J ) 0ER Yes w M 
ee? —1 


5.4 Archimedean Copulas 


The Gumbel copula (5.11) and the Clayton copula (5.12) belong to the family of so- 
called Archimedean copulas, which has been very extensively studied. This family 
has proved useful for modelling portfolio credit risk, as will be seen in Example 8.9. 
In this section we look at the simple structure of these copulas and establish some 
of the properties that we will need. 


5.4.1 Bivariate Archimedean Copulas 


As well as the Gumbel and Clayton copulas, two further examples we consider are 
the Frank copula 


e a E Newt) =D) 7 


exp(—0)— 1 
and a two-parameter copula that we refer to as a generalized Clayton copula: 
Cos M1, u2) = (UP? — D? +z? -D ++, 630, 8B. 


It may be verified that, provided the parameter 6 lies in the ranges we have specified 
in the copula definitions, all four examples that we have met have the form 


Clu, u2) = @ '(b(u1) + $ (u2)), (5.34) 


where @ is a decreasing function from [0, 1] to [0, oo], satisfying ¢(0) = œœ, 
(1) = 0, known as the generator of the copula, and g~! is its inverse. For example, 
for the Gumbel copula (t) = (— 1n t)? for @ > 1, and for the other copulas the 
generators ¢ (t) are given in Table 5.4. 

When we introduced the Clayton copula in (5.12) we insisted that its param- 
eter should be positive. Note that it is in fact possible to have a Clayton copula 
with —1 < @ < 0, although in this case the construction (5.34) must be gener- 
alized slightly. Suppose, for example, that 0 = —5; the Clayton copula generator 
d(t) = 6—1(t-* — 1)is then a strictly decreasing function mapping [0, 1] into [0, 2]. 
If we attempt to evaluate (5.34) in, say, the point u1 = u2 = 0.16, we have a problem 
since ġ (u1) + ġ (u2) = 2.4 and o7! (2.4) is undefined. 
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To obtain a copula in a case when ġ (0) < œo we introduce a so-called pseudo- 
inverse of the generator and give a theorem that explains exactly when a construction 
resembling (5.34) yields a copula. 


Definition 5.41 (pseudo-inverse). Suppose ¢ : [0, 1] — [0, co] is continuous and 
strictly decreasing with (1) = 0 and ¢ (0) < oo. We define a pseudo-inverse of @ 
with domain [0, oo] by 


-1 
-ily — $~ (t), O<t< (0), 535 
a 0, $0) <t < œ. Pon 


Theorem 5.42 (bivariate Archimedean copula). Let  : [0,1] — [0,20] be 
continuous and strictly decreasing with ¢(1) = 0 and gi (t) as in (5.35). Then 


C(u1, u2) = p7 (Q (u1) + b(u2)) (5.36) 


is a copula if and only if ọ is convex. 


Proof. See Nelsen (1999, pp. 91, 92). 


All copulas constructed according to (5.36) are called bivariate Archimedean 
copulas. If (0) = oo the generator is said to be strict and we may replace the 
pseudo-inverse #!~ |! by the ordinary functional inverse 7! as in (5.34). In summary 
we have the following. 


Definition 5.43 (Archimedean copula generator). A continuous, strictly decreas- 
ing, convex function @ : [0,1] — [0, œ] satisfying (1) = 0 is known as an 
Archimedean copula generator. It is known as a strict generator if @(0) = oo. 


In Table 5.4 we indicate when the generators of the four Archimedean copulas 
are strict and give the lower and upper limits of the families as the parameter 6 goes 
to the boundaries of the parameter space. Both the Frank and Clayton copulas are 
known as comprehensive copulas, since they interpolate between a lower limit of 
countermonotonicity and an upper limit of comonotonicity. For a more extensive 
table of Archimedean copulas see Nelsen (1999). 


Remark 5.44. Consider again the Clayton copula with 6 = -5 and non-strict 
generator f(t) = —2(/t — 1). The copula may be written as C(u1, u2) = 
max{(w?-> + us? — 1)”, 0} and this “maximum-with-zero” notation is the common 
way of writing Archimedean copulas with non-strict generators. The countermono- 
tonicity copula is a further example; it is an Archimedean copula with non-strict 


generator ġ (t) = 1 —f. 


Kendall’s rank correlations can be calculated for Archimedean copulas directly 
from the generator using Proposition 5.45 below. The formula obtained can be used 
to calibrate Archimedean copulas to empirical data using the sample version of 
Kendall’s tau, as we discuss in Section 5.5. 
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Table 5.5. Kendall’s rank correlations and coefficients of tail dependence for the copulas 
of Table 5.4. Dı (0) is the Debye function Dı (0) = o7! m t/(exp(t) — 1) dt. 


Copula Pr Au ÀI 
cpi 1-1/0 2 — 21/0 0 
271/9, 650 
co 0/0 +2 0 i f 
6 /(0 + 2) 0, a <0, 
ch 1 -40-!(1 — D1(0)) 0 0 
ccc (2+ 0)6 — 2 7 21/8 2—1/(08) 
oe (2+6)8 


Proposition 5.45. Let X,; and X2 be continuous rvs with unique Archimedean 
copula C generated by @. Then 

p(t) 

p'a) 


1 
(Xi, X2) =1 +4 f dt. (5.37) 
0 


Proof. See Nelsen (1999, p. 130). 


For the closed-form copulas of the Archimedean class, coefficients of tail depend- 
ence are easily calculated using methods of the kind used in Example 5.31. Values 
for Kendall’s tau and the coefficients of tail dependence for the copulas of Table 5.4 
are given in Table 5.5. It is interesting to note that the generalized Clayton copula 
ere subsumes, in a sense, both Gumbel’s family and the strict part of Clayton’s 
family, and thus succeeds in having tail dependence in both tails. 


5.4.2 Multivariate Archimedean Copulas 


It seems natural to attempt to construct a higher-dimensional Archimedean cop- 
ula according to C (u1, . . . , ud) = $17! (¢ (u1) + --- + ġ (ua)). However, this con- 
struction may fail to define a proper distribution function for arbitrary dimension d. 
An example where this occurs is obtained if we take the generator ọ (t) = 1 — t, 
which is not strict. In this case we obtain the Fréchet lower bound for copulas, which 
is not itself a copula for d > 2. 

A necessary condition for the d-dimensional construction to succeed in all dimen- 
sions is that @ should be a strict Archimedean copula generator, although this is not 
sufficient. It was shown by Kimberling (1974) that if @ : [0, 1] —> [0, oo] is a strict 
Archimedean copula generator, then 


C(u, ..., ud) = 9 ($ (u1) +- + bua) (5.38) 


gives acopula in any dimension d if and only if the generator inverse | : [0, oo] > 
[0, 1] is completely monotonic. A decreasing function f (t) is completely monotonic 
on an interval [a, b] if it satisfies 


k 
ok ro >0, keN, te(a,b). (5.39) 
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All of the generators in Table 5.4 have inverses which are completely monotonic 
on [0, co] (if we restrict to 0 > 0 for the Clayton copula) and all extend to arbitrary 
dimensions using the construction (5.38). For example, a d-dimensional Clayton 
copula is 

Cpu) = uy? +. tuz d+, 030, (5.40) 


where the limiting case 6 = 0 should be interpreted as the d-dimensional indepen- 
dence copula. 

Another way of describing these Archimedean copulas which extend to arbitrary 
dimensions is in terms of Laplace—Stieltjes transforms of dfs on R™, since every 
completely monotonic function mapping from [0, oo] to [0, 1] can be expressed in 
terms of such transforms. Let G be a df on R* satisfying G(O) = 0 with Laplace- 
Stieltjes transform 


Gm = f e™ dG(x), t>0. (5.41) 
0 


If we define G(oo) := 0, it is not difficult to verify that G : [0, o0] > [0, 1] is a 
continuous, strictly decreasing, function with the property of complete monotonic- 
ity (5.39). It therefore provides a candidate for an Archimedean generator inverse. 

In the following result we show how Laplace-Stieltjes transforms are used to 
construct random vectors whose distributions are multivariate Archimedean copulas. 
In so doing, we also reveal how such copulas may be simulated. 


Proposition 5.46. Let G be a dfonR* satisfying G (0) = 0 with Laplace-—Stieltjes 
transform G as in (5.41) and set G(oo) := 0. Let V be an rv with df G and let 
Ui, ..., Uq be a sequence of rvs that are conditionally independent given V with 
conditional distribution function given by Fy,|y (u | v) = exp(—vG~!(w)) foru € 
[0, 1]. Then 


PU, <uy,...,Ua < ua) = GG" (uy) +--+ + G"(ua)), (5.42) 
so that the df of U = (U,,...,Uqg)’ is an Archimedean copula with generator 
¢=G!. 


Proof. We have 


0O 
PUI <un Ua Sua) = f PU, <S u1, ..., Uqa S ua | V = v) dG (v) 
0 
oo d 
zi | | Fuvu |v) dG) 
0 j=1 


=f exp(—x(G7!(u1) +- -- + G7! (ua))) dG(v) 
0 


= G(G7't) +--+ + ĜT! (ua)). 


Because of the importance of such copulas, particularly in the field of credit risk, 
we will call these copulas LT-Archimedean (LT stands for “Laplace transform”) and 
make the following definition. 
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Definition 5.47 (LT-Archimedean copula). An LT-Archimedean copula is a copula 
of the form (5.38), where ¢ is the inverse of the Laplace—Stieltjes transform of a df 
G on R* satisfying G(O) = 0. 


In the following algorithm we explain how to sample from such copulas using 
Proposition 5.46 and give explicit instructions for the Clayton, Gumbel and Frank 
copulas. 


Algorithm 5.48 (simulation of LT-Archimedean copulas). 


(1) Generate a variate V with df G such that G, the Laplace-—Stieltjes transform 
of G, is the inverse of the generator @ of the required copula. 


(2) Generate independent uniform variates X1, ..., Xa. 

(3) Return U = (G(—1n(X1)/V),..., G(—In(Xg)/V)Y. 

(a) For the special case of the Clayton copula we generate a gamma variate V ~ 
Ga(1/0, 1) with > 0 (see Section A.2.4). The df of V has Laplace transform 


G(t) = (1+ £)! Note that the inverse G~! (t) = t7? — 1 differs from the 
generator in Table 5.4 by a constant factor that is unimportant. 


(b 


wm 


For the special case of the Gumbel copula we generate a positive stable vari- 
ate V ~ St(1/0, 1, y, 0), where y = (cos(a /(20)))? and 0 > 1 (see Sec- 
tion A.2.9 for more details and a reference to a simulation algorithm). This df 
has Laplace transform G(t) = exp(—t!/ A) as desired. 


(c 


wm 


For the special case of the Frank copula we generate a discrete rv V with 
probability mass function p(k) = P(V = k) = (1 — exp(—0))* /(k0) for 
k =1,2,... and0 > 0. This can be achieved by standard simulation methods 
for discrete distributions (see Ripley 1987, p. 71). 


See Figure 5.12 for an example of data simulated from a four-dimensional Gumbel 
copula using this algorithm. Note the upper tail dependence in each bivariate margin 
of this copula. 


5.4.3. Non-exchangeable Archimedean Copulas 


A copula obtained from construction (5.38) is obviously an exchangeable copula 
conforming to (5.18). While exchangeable bivariate Archimedean copulas are widely 
used in modelling applications, their exchangeable multivariate extensions represent 
a very specialized form of dependence structure and have more limited applications. 
An exception to this is in the area of credit risk, as will be seen in Chapter 8, although 
even here more general models with group structures are also needed. It is certainly 
natural to enquire whether there are extensions to the Archimedean class that are 
not rigidly exchangeable, and we devote this section to a short discussion of some 
possible extensions. 
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Figure 5.12. Pairwise scatterplots of 1000 simulated points from a four-dimensional 
exchangeable Gumbel copula with 6 = 2. Data are simulated using Algorithm 5.48. 


Asymmetric bivariate copulas. Let Cg be any exchangeable bivariate copula. Then 
a parametric family of asymmetric copulas Co ,«,g is obtained by setting 


Coo p (1, u2) = ulus  Co(u%, uf), OS uu <1, (5.43) 


where 0 < a, 6 < 1. Only in the special case a = £ is the copula (5.43) exchange- 
able. Note also that when both parameters are zero, Cọ,0,0 is the independence 
copula, and when both parameters are one, Cg,1,1 is simply Cg. When Cg is an 
Archimedean copula, we refer to copulas constructed by (5.43) as asymmetric bivari- 
ate Archimedean copulas. 

We check that (5.43) defines a copula by constructing a random vector with this 
df and observing that its margins are standard uniform. Since the construction of a 
random vector amounts to a simulation recipe, we present it as such. 


Algorithm 5.49 (asymmetric bivariate Archimedean copula). 


(1) Generate a random pair (V1, V2) with df Co. 


(2) Generate, independently of V;, V2, two independent standard uniform variates 
U, and U2. 
l/a 5 


(3) Return U; = max{ V," , gj and U2 = max{ V3’, Ph, 
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R 


Figure 5.13. Pairwise scatterplots of 10 000 simulated points from an extension of the 
Gumbel copula cr given by C4,0.95,0.7 in (5.43). This is simulated using Algorithm 5.49. 


It may be easily verified that (U1, U2) have the df (5.43). See Figure 5.13 for an 
example of simulated data from an asymmetric copula based on Gumbel’s copula. 
Note that an alternative copula may be constructed by taking (Vj, V2) in Algo- 
rithm 5.49 to be distributed according to some copula other than the independence 
copula. 


Non-exchangeable, higher-dimensional Archimedean copulas. Non-exchange- 
able, higher-dimensional Archimedean copulas with exchangeable bivariate mar- 
gins can be constructed by recursive application of Archimedean generators and 
their inverses, and we will give examples in this section. The biggest problem with 
these constructions lies in checking that they lead to valid multivariate distributions 
satisfying (5.1). The necessary theory is complicated and we will simply indicate the 
nature of the conditions that are necessary without providing justification; a com- 
prehensive reference is Joe (1997). It turns out that with some care we can construct 
situations of partial exchangeability. We give three- and four-dimensional examples 
which indicate the pattern of construction. 


Example 5.50 (three-dimensional non-exchangeable Archimedean copulas). 
Suppose that ¢; and ¢2 are two strict Archimedean generators and consider 


C(w1, u2, u3) = 65 ($2 0 6; (Q1 (u1) + 1 (u2)) + b2(u3)). (5.44) 


Conditions that ensure that this is a copula are that the generator inverses ¢;! and 
pz l are completely monotonic decreasing functions, as in (5.39), and the compo- 
sition @2 o ¢,- : [0, co] — [0, co] is a completely monotonic increasing function, 
i.e. a function g satisfying 


= k1 oe N 
(1! [a 20, kEN. 


Observe that when ¢2 = ¢; = ġ we are back in the situation of full exchangeability, 
as in (5.38). Otherwise, if ġ1 Æ $2 and (U1, U2, U3) is a random vector with df given 
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by (5.44), then only U; and U2 are exchangeable, i.e. (U1, U2, U3) 2 (U2, U1, U3), 
but no other swapping of subscripts is possible. All bivariate margins of (5.44) are 
themselves Archimedean copulas. The margins C13 and C23 have generator ġ2 and 
Cı2 has generator ¢). 


Example 5.51 (four-dimensional non-exchangeable Archimedean copulas). A 
possible four-dimensional construction is 


C(u, u2, u3, u4) = $3 ' ($309; | (G1 (u1) +61 (U2) +4300) | (b2(U3) + G2(u4))), 

(5.45) 
where $1, Q2 and 3 are three distinct, strict Archimedean generators and we assume 
that their inverses and the composite functions $3 o pr’ and $305 l are completely 
monotonic to obtain a proper distribution function. This is not the only possible four- 
dimensional construction (Joe 1997), but it is a useful construction because it gives 
two exchangeable groups. If (U1, U2, U3, U4) has the df (5.45), then U; and U3 are 
exchangeable, as are U3 and U4. 


The same kinds of construction can be extended to higher dimensions, subject 
again to complete monotonicity conditions on the compositions of generators and 
generator inverses. 


LT-Archimedean copulas with p-factor structure. Recall from Definition 5.47 the 
family of LT-Archimedean copulas. It follows easily from (5.42) that these have the 
form 


d 
Can.. ua) = E( exp (-V ô=) (5.46) 
i=l 


for strictly positive rvs V with Laplace—Stieltjes transform G. Itis possible to gener- 
alize the construction (5.46) to obtain a larger family of non-exchangeable copulas, 
which will be useful in the context of dynamic credit risk models (see Section 9.7). An 
LT-Archimedean copula with p-factor structure is constructed from a p-dimensional 
random vector V = (Vj,..., Vp)’ with independent, strictly positive components 
and a matrix A € R¢*? with elements aij > 0 as follows: 


d 
Clas... ud) = E( exp (- Dave;tuy)), (5.47) 


i=l 
where a; is the ith row of A and Gr! is the Laplace-Stieltjes transform of the strictly 
positive rv a; V. 
We can write (5.47) in a different way, which facilitates the computation of 
C(uy,...,Uq). Note that 


d P d 
X a; VÖ; u) = ye Vj X aijG; (ui). 
i=l 


j=l i=1 
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It follows from the independence of the V; that 


P 
Cui, ..., ua) =] [E (<x(- vy asô; lui) )) 


p d 
=[|G, ( X aijG hu). (5.48) 
jel i=l 
Note that (5.48) is easy to evaluate when Gy,, the Laplace—Stieltjes transform of 
the Vj, is available in closed form, because G; (t)= Mi- 1 Gy, (aijt) by the inde- 
pendence of the Vj. 


Notes and Comments 


The name Archimedean relates to an algebraic property of the copulas which resem- 
bles the Archimedean axiom for real numbers (see Nelsen 1999, p. 98). Clayton’s 
copula was introduced in Clayton (1978), although it has also been called the Cook 
and Johnson copula (see Genest and MacKay 1986) and the Pareto copula (see 
Hutchinson and Lai 1990). For Frank’s copula see Frank (1979); this copula has 
radial symmetry and is the only such Archimedean copula. 

Theorem 5.42 is a result of Alsina, Frank and Schweizer (2005). The for- 
mula for Kendall’s tau in the Archimedean family is due to Genest and MacKay 
(1986). The link between completely monotonic functions and generators which 
give Archimedean copulas of the form (5.38) is found in Kimberling (1974). See 
also Feller (1971) for more on the concept of complete monotonicity. For more 
on the important connection between Archimedean generators and Laplace trans- 
forms, see Joe (1997). For a single reference containing most of the main theory 
for bivariate Archimedean copulas and some of the results on higher-dimensional 
exchangeable Archimedean copulas consult Nelsen (1999). 

Proposition 5.46 and Algorithm 5.48 are due to Marshall and Olkin (1988). See 
Frees and Valdez (1997), Schonbucher (2002), Frey and McNeil (2003) and Chap- 
ters 8 and 9 of this book for further discussion of this technique. 

For more details on the asymmetric bivariate copulas obtained from construc- 
tion (5.43) and ideas for more general asymmetric copulas see Genest, Ghoudi and 
Rivest (1998). These copula classes were introduced in the PhD thesis of Khoudraji 
(1995). For additional theory concerning partially exchangeable higher-dimensional 
Archimedean copulas with exchangeable bivariate margins, see Joe (1997). LT- 
Archimedean copulas with p-factor structure have been proposed by Rogge and 
Schonbucher (2003) with applications in credit risk in mind. 

Other copula families we have not considered include the Marshall—Olkin copulas 
(Marshall and Olkin 1967a,b) and the extremal copulas in Tiit (1996). 


5.5 Fitting Copulas to Data 


We assume that we have data vectors X1,..., X, with identical distribution 
function F, describing financial losses or financial risk factor returns; we write 
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Xı = (X11,.-., X14)’ for an individual data vector and X = (X1,..., Xa)’ for 
a generic random vector with df F. We assume further that this df F has contin- 
uous margins F;,..., Fg and thus, by Sklar’s Theorem, a unique representation 
F(x) = C(FiQa1),..., Fa@a)). 

It is often very difficult, particularly in higher dimensions and in situations where 
we are dealing with skewed loss distributions or heterogeneous risk factors, to find 
a good multivariate model that describes both marginal behaviour and dependence 
structure effectively. For multivariate risk-factor return data of a similar kind, such as 
stock returns or exchange-rate returns, we have discussed useful overall models such 
as the generalized hyperbolic family of Section 3.2.3, but even in these situations 
there can be value in separating the marginal-modelling and dependence-modelling 
issues and looking at each in more detail. The copula approach to multivariate models 
facilitates this approach and allows us to consider, for example, the issue of whether 
tail dependence appears to be present in our data. 

This section is thus devoted to the problem of estimating the parameters 0 of 
a parametric copula Cg. The main method we consider is maximum likelihood 
in Section 5.5.3. First we outline a simpler method-of-moments procedure using 
sample rank correlation estimates. This method has the advantage that marginal 
distributions do not need to be estimated, and consequently inference about the 
copula is in a sense “margin-free”’. 


5.5.1 Method-of-Moments using Rank Correlation 


Depending on which particular copula we want to fit, it may be easier to use empirical 
estimates of either Spearman’s or Kendall’s rank correlation to infer an estimate for 
the copula parameter. We begin by discussing the standard estimators of both of 
these rank correlations. 

Definition 5.28 suggests that we could estimate ps(X;, Xj) by calculating the 
usual correlation coefficient for the pseudo-observations: {(Fi,n(Xt,i), Fjn(X1,j)) : 
t = 1,...,n}, where Fi, n denotes the standard empirical df for the ith margin. 
Equivalently, if we use rank (X;,;) to denote the rank of X;,; in X1;,..., Xn, i (Le. its 
position in the ordered sample), we can calculate the correlation coefficient for the 
rank data {(rank(X;,;), rank(X;, ;))}, and this gives us the Spearman’s rank correla- 
tion coefficient: 

12 =< i i 
AAT D — 4(n + 1))(rank (X; ;) — 42 + 1)). (5.49) 


n(n? — 


We will denote by RS the matrix of pairwise Spearman’s rank correlation coeffi- 
cients; since this is the sample correlation matrix of the vectors of ranks it is clearly 
a positive semidefinite matrix. 

The standard estimator of Kendall’s tau pr (X;, Xj) is Kendall ’s rank correlation 
coefficient: 


z 
(5) SY Sigan — Xs DX — Xs,9)). (5.50) 


l<t<s<n 
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This is clearly the empirical analogue of the theoretical Kendall’s tau in Defini- 
tion 5.27. Note that the actual evaluation of this estimator for large n is time- 
consuming (in comparison with Spearman’s rank) because every pair of observa- 
tions must be considered. Again we can collect pairwise Kendall’s rank correlation 
coefficients in a matrix R”; by observing that this matrix may be written as 


=j 
R= (5) Y sign(X, — Xs) sign(X, — Xs)’, 


2 
l<t<s<n 


it is again apparent that this gives a positive semidefinite matrix. 

In a series of examples we show how these sample rank correlations can be used 
to calibrate (or partially calibrate) various copulas. Obviously we assume that there 
are a priori grounds for considering the chosen copula to be an appropriate model, 
such as symmetry or the lack of it and the presence or absence of tail dependence. 
The general method will always be similar: we look for a theoretical relationship 
between one of the rank correlations and the parameters of the copula and substitute 
empirical values of the rank correlation into this relationship to get estimates of 
some or all of the copula parameters. 


Example 5.52 (bivariate Archimedean copulas with a single parameter). Sup- 
pose our assumed model is of the form F (x1, x2) = Co(Fi (x1), Fo(x2)), where 6 
is a single parameter to be estimated. For many such copulas a simple functional 
relationship exists between either Kendall’s tau and 0 or Spearman’s rho and @. For 
specific examples consider the Gumbel, Clayton and Frank copulas of Section 5.4; 
in these cases we have simple relationships of the form p;(X1, X2) = f(0), as 
shown in Table 5.5. This suggests we can calibrate these copulas by first calculat- 
ing a sample value r7 for Kendall’s tau and then solving the equation r% = f (ô) 
for 6, assuming that 6 is a valid value in the parameter space of the copula. For 
example, Gumbel’s copula is calibrated by taking 6=(1—r)7!, provided that 
T > 0. Clayton’s copula interpolates between perfect negative and perfect positive 
dependence and can be calibrated to any sample Kendall’s tau value in (—1, 1). 


Example 5.53 (calibrating Gauss copulas using Spearman’s rho). Suppose we 
assume a meta-Gaussian model for X with copula C p and we wish to estimate the 
correlation matrix P. It follows from Theorem 5.36 that 


ps(Xi, Xj) = (6/7) arcsin 5 pij © Pij, 


where the final approximation is very accurate (see Figure 5.10). This suggests we 
estimate P by the matrix of pairwise Spearman’s rank correlation coefficients RS. 


The method of Example 5.53 could be used to estimate P in a t copula model 
C A pF 1(x1), ..., Fa(xa)), although the calibration would not be as accurate as in 
the Gaussian case. The value of ps(X;, Xj) in terms of p;j is not known in closed 
form but simulation studies suggest that the error |og(X;, Xj) — pijl, while still 
modest, is larger than in the Gaussian case. Instead we propose a method based on 
Kendall’s tau in the next example, which is based on Proposition 5.37 and could be 
applied to all elliptical copulas. 
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Example 5.54 (calibrating ¢ copulas using Kendall’s tau). Suppose we assume 
a meta-t model for X with copula C , p and we wish to estimate the correlation 
matrix P. It follows from Proposition 5.37 that 


Pr(Xi, Xj) = (2/7) arcsin pij, 


so that a possible estimator of P is the matrix R* with components given by 
* 
ij 
formation of the matrix of Kendall’s rank correlation coefficients will remain posi- 


r* = sin(5rr}). However, there is no guarantee that this componentwise trans- 
tive definite (although in our experience it very often does). In this case R* can be 
transformed by the eigenvalue method given in Algorithm 5.55 to obtain a positive- 
definite matrix that is close to R*. The remaining parameter v of the copula could 
then be estimated by maximum likelihood, as discussed in Section 5.5.3. 


Algorithm 5.55 (eigenvalue method). Let R* be a so-called pseudo-correlation 
matrix, i.e. a symmetric matrix of pairwise correlation estimates with unit diagonal 
entries and off-diagonal entries in [—1, 1] that is not positive semidefinite. 


(1) Calculate the spectral decomposition R* = GLG” as in (3.67), where L is 
the matrix of eigenvalues and G is an orthogonal matrix whose columns are 
eigenvectors of R*. 


(2) Replace all negative eigenvalues in L by small values ô > 0 to obtain L. 


(3) Calculate Q = GLG’ , which will be symmetric and positive definite but not 
a correlation matrix, since its diagonal elements will not necessarily equal 
one. 


(4) Return the correlation matrix R = (Q), where go denotes the correlation 
matrix operator defined in (3.5). 


In Examples 5.53 and 5.54 we saw that it is relatively easy to calibrate the Gauss 
copula and the correlation parameter matrix P of the t copula to sample rank cor- 
relations. This technique is particularly useful when we have limited multivariate 
data and formal estimation of a full multivariate model is unrealistic. Consider the 
following hypothetical example. 


Example 5.56 (fictitious risk integration situation). Suppose a company is 
divided into a number of business units that function semiautonomously. The com- 
pany management would like to calculate an enterprise-wide P&L distribution for 
a one-month period. They have historical data on monthly results for each of the 
business units for the last two years only, i.e. 24 observations. However, each busi- 
ness unit believes that through detailed knowledge of their own business going back 
over a longer period they can specify their own P&L fairly accurately. Rather than 
attempting to fit a multivariate distribution to 24 observations, the risk-management 
team decides to combine the individual marginal models provided by each of the 
business units using a matrix of rank correlations estimated from the 24 data points. 


In this situation we can build multivariate models by combining the known 
marginal distributions using any copula that can be calibrated to the estimated rank 
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correlations. The Gaussian and ¢ copulas lend themselves to this purpose and can be 
used to build meta-Gaussian and meta-t models that are consistent with the available 
information. 

Typically, these models could then be used in a Monte Carlo risk analysis; we have 
seen in Section 5.1.4 that meta-Gaussian and meta-t models are particularly easy to 
simulate. Because the approach is obviously prone to model risk (24 observations 
provide very meagre multivariate data) it should be seen as a form of sensitivity 
analysis performed using detailed marginal information and only vague depend- 
ence information; we might choose to compare a meta-Gaussian model with no tail 
dependence and a meta-t model with, say, three degrees of freedom and very strong 
tail dependence. 


5.5.2 Forming a Pseudo-Sample from the Copula 


We now turn to the estimation of parametric copulas by maximum likelihood (ML). 
In practical situations we are seldom interested in the copula alone, but also require 
estimates of the margins to form a full multivariate model; even when the copula is 
of central interest, as it is for us in this chapter, we are forced to estimate margins in 
order to estimate the copula, since copula data are almost never observed directly. 

While we may attempt to estimate margins and copula in one single optimiza- 
tion, splitting the modelling into two steps can yield more insight and allow a more 
detailed analysis of the different model components. In this section we describe 
briefly some general approaches to the first step of estimating margins and construct- 
ing a pseudo-sample of observations from the copula. In the following section we 
describe how the copula parameters are estimated by ML from the pseudo-sample. 

Let Ê Libs Îi denote estimates of the marginal dfs (possible methods are 
discussed below). The pseudo-sample from the copula consists of the vectors 
U1, at Un, where 


Û, = Ur, «++ Ura) = (F(X), o Pa Xia). (5.51) 
Observe that, even if the original data vectors X1, .. . , X, are iid, the pseudo-sample 


data are generally dependent, because the marginal estimates Ê, will in most cases 
be constructed from all of the original data vectors through the univariate samples 
X1i,-.., Xn,i. Possible methods for obtaining the marginal estimate Ê, include the 
following. 


(1) Parametric estimation. We choose an appropriate parametric model for the 
data in question and fit it by ML: for financial risk factor return data we might 
consider the generalized hyperbolic distribution, or one of its special cases such 
as Student t or normal inverse Gaussian (NIG); for insurance or operational loss 
data we might consider a standard actuarial loss distribution such as Pareto or 
lognormal. 


(2) Non-parametric estimation with variant of empirical df. We could estimate 
Fj; using 


Fin@) = > Dixie (5.52) 


5.5. Fitting Copulas to Data 


O 0.2 0.4 0.6 0.8 1. 


INTC 


MSFT 


os, 


Tees 


ZN 


x 


y? 
woni 
AH 


- ot, 3 °%. Maa 


-ite p BA weet A 
ras < 


Puerre o 
. 


|à mee E as 
1% Sag R dpe na $ 


1.0 
- 0.8 
0.6 
E 0.4 
+ 0.2 
FO 


T T 


0 0.2 0.4 0.6 0.8 1.0 


T 


0 0.2 0.4 0.6 0.8 1.0 


233 


Figure 5.14. Pairwise scatterplots of pseudo-sample from copula for trivariate Intel, 
Microsoft and General Electric log-returns (see Example 5.57). 


which differs from the usual empirical df by the use of the denominator n + 1 
rather than n. This guarantees that the pseudo-copula data in (5.51) lie strictly 
in the interior of the unit cube; to implement ML we must be able to evaluate 
the copula density at each Uj, and in many cases this density is infinite on the 


boundary of the cube. 


(3) Extreme value theory for the tails. Empirical distribution functions are known 
to be poor estimators of the underlying distribution in the tails. An alternative is to 
use a technique from extreme value theory, described in Section 7.2.6, whereby 
the tails are modelled semiparametrically using a generalized Pareto distribution 
(GPD); the body of the distribution may be modelled empirically. 


Example 5.57. We analyse five years of daily log-return data (1996-2000) 
for Intel, Microsoft and General Electric stocks. The marginal distributions are 
estimated empirically (method (2)) and the pseudo-sample from the copula is 
shown in Figure 5.14. Essentially, the points are plotted at the coordinates 
(rank(X;,;)/(2 + 1), rank(X;,;)/(2 + 1)), where rank(X;,;) denotes the rank of 


Xz, i inthe sample X1j,..., Xni. 
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5.5.3 Maximum Likelihood Estimation 


Let Cg denote a parametric copula, where @ is the vector of parameters to be esti- 
mated. The MLE is obtained by maximizing 


In L(O; Uy, ..., Un) = X Inco (Oy) (5.53) 


t=1 


with respect to 0, where cg denotes the copula density as in (5.16) and Û, denotes 
a pseudo-observation from the copula. 

Obviously the statistical quality of the estimates of the copula parameters depends 
very much on the quality of the estimates of the marginal distributions used in 
the formation of the pseudo-sample from the copula. The properties of estimates 
derived using the marginal estimation methods (1) and (2) in Section 5.5.2 have both 
been studied in more theoretical detail. When margins are estimated parametrically 
(method (1)), inference about the copula using (5.53) amounts to what has been 
termed the inference-functions for margins (IFM) approach by Joe (1997). When 
margins are estimated non-parametrically (method (2)), the estimates of the copula 
parameters may be regarded as semiparametric and the approach has been labelled 
pseudo-maximum likelihood by Genest and Rivest (1993) (see Notes and Comments 
for more references). One could envisage using the two-stage method to decide on 
the most appropriate copula family and then estimating all parameters (marginal 
and copula) in a final fully parametric round of estimation. 

In practice, to implement the ML method we need to derive the copula density. 
This is straightforward, if tedious, for the exchangeable Archimedean copulas of 
Section 5.4, and these have been popular models in bivariate and trivariate applica- 
tions to insurance loss data. For implicit copulas like the Gaussian and t copulas we 
use (5.17). The MLE is generally found by numerical maximization of the resulting 
log-likelihood (5.53). 


Example 5.58 (fitting the Gauss copula). In the case of a Gauss copula we 
use (5.17) to see that the log-likelihood (5.53) becomes 


In L(P; O;,...,Un) 
n n d 
= X In fp(@7'(G,.1),..., 8 '(G.a)) — Y Y ngoo Ô, j), 
t=1 t=1 j=1 
where fy will be used to denote the joint density of a random vector with N4 (0, X) 
distribution. It is clear that the second term is not relevant in the maximization with 
respect to P, and the MLE is given by 


n 
p= l Y,), 5.54 
are as ) n fy (¥;) (5.54) 
where Y; j = &' (Û, ;) for j = 1,...,d and P denotes the set of all possible 


linear correlation matrices. To perform this maximization in practice, note that the 
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set P can be constructed as 
P = {P = (Q) : Q = AA’, A lower triangular with ones on the diagonal}, 


where go is defined in (3.5). In other words, we can search over the set of unre- 
stricted lower-triangular matrices with ones on the diagonal. This search is feasible 
in low dimensions but very slow in high dimensions, since the number of parameters 
is O(d?). 

An approximate solution to the maximization may be obtained easily as follows. 
Suppose that instead of maximizing over P as in (5.54) we maximize over the set 
of all covariance matrices. This maximization problem has the analytical solution 
È = (1/n) yor Y Y/, which is the MLE of the covariance matrix X for iid normal 
data with Nq(0, X) distribution. In practice, È is likely to be close to being a 
correlation matrix. As an approximate solution to the original problem we could 
take the correlation matrix P = Q (È ). 

When a Gauss copula is fitted to the trivariate data in Example 5.57 by full ML, 
the estimated correlation matrix has entries 0.58 (INTC-MSFT), 0.34 (INTC-GE) 
and 0.40 (MSFT-GE); the value of the log-likelihood at the maximum is 376.65. 
Using the alternative method gives estimates that are identical to two significant 
figures and that yield a log-likelihood value of 376.62. 

A further alternative would be to use the estimation procedure in Example 5.53, 
based on Spearman’s rank correlations. Using the Spearman method we get, respec- 
tively, 0.57, 0.34 and 0.40 for the parameter estimates; the value of the log-likelihood 
at this value of P is 376.50, which is also not so far from the maximum. 


Example 5.59 (fitting the ¢ copula). In the case of the ¢ copula, (5.17) implies that 
the log-likelihood (5.53) is 


In L(y, P; Ôi, ..., Ôn) 
n d 


= X Ing p(ty'(Gr.1),....t7' Gra) — X Y In golt; (Ô, ;)), 


t=1 t=1 j=1 


where g, p denotes the joint density of a random vector with tg(v, 0, P) distribu- 
tion, P is a linear correlation matrix, gą is the density of a univariate tı (v, 0, 1) 
distribution, and t; ' is the corresponding quantile function. 

Again, in relatively low dimensions, we could search over the set of correlation 
matrices P and degrees of freedom parameter v for a global maximum. For higher- 
dimensional work it would be easier to estimate P using Kendall’s tau estimates, as 
in Example 5.54, and to estimate the single parameter v by maximum likelihood. 

When a ¢ copula is fitted to the trivariate data in Example 5.57 by full ML 
the estimated matrix P has entries 0.59 (INTC-MSFT), 0.36 (INTC-GE) and 0.42 
(MSFT-GE); the estimate of v is 6.5 and the value of the log-likelihood at the max- 
imum is 420.39. Using the simpler method based on Kendall’s tau gives identical 
parameter estimates to two significant figures and a log-likelihood value of 420.32. 
Clearly, the t model fits much better than a Gauss copula model; the log-likelihood 
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is increased by over 40. This would be massively significant in a likelihood ratio 
test (although, strictly speaking, such a test introduces a technical difficulty, since 
the Gauss copula represents a boundary case of the t copula model (v = 00), which 
violates standard regularity conditions (see Notes and Comments)). 


Notes and Comments 


The copula estimation procedure based on empirical values of Kendall’s tau is dis- 
cussed in detail for bivariate Archimedean copulas by Genest and Rivest (1993); 
they explain why the procedure may be considered to be a method-of-moments 
technique and show how confidence intervals for the copula parameter (in the case 
of single-parameter copulas) may be derived. 

The method of calibrating the Gauss copula with Spearman’s rank correlation 
in Example 5.53 is essentially due to Iman and Conover (1982). The use of this 
calibration method to build meta-Gaussian models with prescribed margins and the 
Monte Carlo simulation of data from these models is implemented in the @RISK 
software (Palisade 1997), which is widely used in insurance. Our Example 5.54 is 
intended to show that this approach can be extended to meta-t models, which may 
well be more interesting due to their tail dependence. 

The eigenvalue method for correcting the positive definiteness of correlation 
matrices given in Algorithm 5.55 is described by Rousseeuw and Molenberghs 
(1993). An empirical comparison of the eigenvalue method with different approaches 
to this problem, including so-called shrinkage methods, is found in Lindskog (2000). 

The inference-functions for margins (IFM) approach to the estimation of copulas 
(method (1) of Section 5.5.2 followed by maximization of (5.53)) is described by 
Joe (1997), who gives asymptotic theory; the name of the approach (IFM) follows 
terminology of McLeish and Small (1988). 

The pseudo-likelihood approach to copula estimation (method (2) of Section 5.5.2 
followed by maximization of (5.53)) is described in Genest and Rivest (1993), and 
the consistency and asymptotic normality of the resulting parameter estimates are 
demonstrated. In Monte Carlo simulations it is found that this method outperforms 
the Kendall’s tau method for a bivariate Clayton copula (see also Genest, Ghoudi 
and Rivest 1995). 

Frees and Valdez (1997) discuss the relevance of copulas in actuarial applications 
and give an example where copulas are fitted to data using the Kendall’s tau method 
and the IFM method. Also in an insurance context, Klugman and Parsa (1999) 
discuss ML inference for copulas and bivariate goodness-of-fit tests while Chen and 
Fan (2005) describe a likelihood-ratio test for semiparametric copula selection. 

The fitting of the ¢ copula to data and statistical aspects of testing this cop- 
ula against the Gauss copula are discussed at length in Mashal and Zeevi (2002); 
the technical problem that the Gauss copula is a boundary case of the t copula is 
addressed in this paper and a correction is suggested. The authors provide a number 
of financial examples suggesting that extremal dependence is a feature of finan- 
cial data. Breymann, Dias and Embrechts (2003) fit various bivariate copulas to 
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high-frequency financial return data at different timescales and provide extensive 
comparisons with respect to goodness-of-fit. 

Papers developing dynamic time series models for financial return data using 
copulas include Chen and Fan (2005), Patton (2004, 2005) and Fortin and Kuzmics 
(2002). 


6 


Aggregate Risk 


This chapter is devoted to a number of theoretical concepts in quantitative risk 
management that fall under the broad heading of aggregate risk. We understand 
aggregate risk as the risk of a portfolio, which could even be the entire position in 
risky assets of a financial enterprise. The material builds on general ideas in risk 
measurement discussed in Section 2.2 and also uses in certain places the copula 
theory of Chapter 5 and some facts about elliptical distributions from Section 3.3. 

In Section 6.1 we treat the issue of measuring aggregate risk. We discuss proper- 
ties that a good measure of risk should have with particular emphasis on aggregation 
properties. This leads us to study the class of coherent risk measures. In Section 6.2 
we consider the problem of bounding an aggregate risk if we know something about 
the individual risks that contribute to the whole but have only limited information 
about their dependence. We discuss specific difficulties that arise when risk is mea- 
sured with a non-subadditive risk measure like VaR. Finally, in Section 6.3, we treat 
the subject of allocating risk capital, i.e. of distributing the risk capital for a port- 
folio to the individual risks in the portfolio. This issue is relevant for purposes of 
performance measurement, loan pricing and capital budgeting. 


6.1 Coherent Measures of Risk 


The premise of this section is the idea of approaching risk measurement by first 
writing down a list of properties that a good risk measure should have. Such a list 
was proposed for applications in financial risk management in the seminal paper by 
Artzner et al. (1999). Using economic reasoning, they specified a number of axioms 
that any so-called coherent risk measure should satisfy. Moreover, they studied the 
coherence properties of widely used risk measures such as VaR or expected shortfall 
and gave a characterization of all coherent risk measures in terms of generalized 
scenarios. Our development of the subject will follow their approach. It should 
be mentioned that the idea of axiomatic systems for risk measures bears some 
relationship to similar systems for premium principles in the actuarial literature, 
which have a long and independent history (see, for example, Goovaerts et al. (2003) 
and further references in Notes and Comments). 


6.1.1 The Axioms of Coherence 


In order to introduce the axioms of coherence we have to give a formal definition 
of risk measures. Fix some probability space (2, F, P) and a time horizon A. 
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Denote by L°(Q, F, P) the set of all rvs on (2, F), which are almost surely finite. 
Financial risks are represented by aset M C L°(2, F, P) of rvs, which we interpret 
as portfolio losses over some time horizon A. The time horizon is left unspecified 
and will only enter when specific problems are considered. We often assume that 
M is a convex cone, i.e. that L; E€ M and L2 € M implies that Lı + L2 E€ M and 
AL, € M for every A > 0. Risk measures are real-valued functions ọ : M —> R 
defined on such cones of rvs, satisfying certain properties. 

We interpret o(L) as the amount of capital that should be added to a position with 
loss given by L, so that the position becomes acceptable to an external or internal 
risk controller. Positions with ọ(L) < 0 are acceptable without injection of capital; 
if ọ(L) < 0, capital may even be withdrawn. Note that our interpretation of L 
differs from that in Artzner et al. (1999), where an rv L € M is interpreted as the 
future value (instead of the loss) of a position currently held. This leads to some 
sign changes in the discussion of the axioms of coherence compared with other 
presentations in the literature. Also note that in order to simplify the presentation 
we Set interest rates equal to zero so that there is no discounting. 

Now we can introduce the axioms that a risk measure ọ : M — R ona convex 
cone M should satisfy in order to be called coherent. 


Axiom 6.1 (translation invariance). For all L € M and every / € R we have 
o(L+/1)=o(L) +1. 


Axiom 6.1 states that by adding or subtracting a deterministic quantity l to a 
position leading to the loss L we alter our capital requirements by exactly that 
amount. The axiom is in fact necessary for the risk-capital interpretation of ọ to 
make sense. Consider a position with loss L and @(L) > 0. Adding the amount 
of capital ọ(L) to the position leads to the adjusted loss L=L= o(L), with 
o(L) = ọ(L) — o(L) = 0, so that the position Lis acceptable without further injec- 
tion of capital. 


Axiom 6.2 (subadditivity). For all L;, Ly E€ M we have o(L; + L2) < e(L1) + 
Q(L2). 


The rationale behind Axiom 6.2 is summarized by Arztner et al. in the state- 
ment that “a merger does not create extra risk” (ignoring of course any problematic 
practical aspects of a merger!). Axiom 6.2 is the most debated of the four axioms 
characterizing coherent risk measures, probably because it rules out VaR as a risk 
measure in certain situations. We provide some arguments explaining why subad- 
ditivity is indeed a reasonable requirement. 


e Subadditivity reflects the idea that risk can be reduced by diversification, a 
time-honoured principle in finance and economics. In particular, we will see 
in Section 6.1.5 that the use of non-subadditive risk measures in a Markowitz- 
type portfolio optimization problem may lead to optimal portfolios that are 
very concentrated and that would be deemed quite risky by normal economic 
standards. 
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e If a regulator uses a non-subadditive risk measure in determining the regula- 
tory capital for a financial institution, that institution has an incentive to legally 
break up into various subsidiaries in order to reduce its regulatory capital 
requirements. Similarly, if the risk measure used by an organized exchange 
in determining the margin requirements of investors is non-subadditive, an 
investor could reduce the margin he has to pay by opening a different account 
for every position in his portfolio. 


Subadditivity makes decentralization of risk-management systems possible. 
Consider as an example two trading desks with positions leading to losses 
Lı and L2. Imagine that a risk manager wants to ensure that @(L), the risk 
of the overall loss L = Lı + Lo, is smaller than some number M. If he 
uses a risk measure @, which is subadditive, he may simply choose bounds 
Mı and M2 such that Mı + M2 < M and impose on each of the desks the 
constraint that o(L;) < Mj; subadditivity of ọ then ensures automatically 
that o(L) < Mı + Mo < M. 


Axiom 6.3 (positive homogeneity). For all L € M and every à > 0 we have 
Q(AL) = ào(L). 


Axiom 6.3 is easily justified if we assume that Axiom 6.2 holds. Subadditivity 
implies that, for n € N, 


o(nL) = o(L+---+L) <no(L). (6.1) 


Since there is no netting or diversification between the losses in this portfolio, 
it is natural to require that equality should hold in (6.1), which leads to positive 
homogeneity. Note that subadditivity and positive homogeneity imply that the risk 
measure ọ is convex on M. 


Axiom 6.4 (monotonicity). For Lı, L2 € M such that Lı < L2 almost surely we 
have e(L1) < @(L2). 


From an economic viewpoint this axiom is obvious: positions that lead to higher 
losses in every state of the world require more risk capital. 

For a risk measure satisfying Axioms 6.2 and 6.3, the monotonicity axiom is 
equivalent to the requirement that 0(L) < O for all L < 0. To see this, observe 
that Axiom 6.4 implies that if L < 0, then e(L) < o(0) = 0; the latter equality 
follows from Axiom 6.3 since e(0) = @(A0) = Ae(0) for all A > 0. Conversely, if 
Lı < L2 and we assume that 9(L; — L2) < 0, then ọ(L1) = o(L; — L2 + L2) S 
oọ(Lı — L2) + e(L2) by Axiom 6.2, which implies that o(L1) < e(L2). 


Definition 6.5 (coherent risk measure). A risk measure 9 whose domain includes 
the convex cone M is called coherent (on M) if it satisfies Axioms 6.1-6.4. 


Note that the domain is an integral part of the definition of a coherent risk measure. 
We will often encounter functionals on L°(Q, F, P), which are coherent only if 
restricted to a sufficiently small convex cone M. 
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Remark 6.6 (convex measures of risk). Axiom 6.3 (positive homogeneity) has 
been criticized and, in particular, it has been suggested that for large values of the 
multiplier à we should have e(AL) > A@(L) to penalize a concentration of risk and 
the ensuing liquidity problems. As shown in (6.1), this is impossible for a subadditive 
risk measure. This problem has led to the study of the larger class of convex risk 
measures. In this class the conditions of subadditivity and positive homogeneity have 
been relaxed; instead one requires only the weaker property of convexity, i.e. for all 
Li, L2 E€ M: 


oL + U—A)L2) < ào(Lı) + — à)o(L2), AE[O,1]. (6.2) 


The economic justification of (6.2) is again the idea that diversification reduces 
risk. Within the class of convex risk measures it is possible to find risk measures 
penalizing concentration of risk in the sense that o(AL) > ọ(L) for à > 1. Convex 
risk measures have recently attracted a lot of attention: some references are provided 
in Notes and Comments. 


In the following sections we study the coherence properties of several popular 
risk measures. 


6.1.2 Value-at-Risk 


It is immediately seen from the representation of VaR as a quantile of the loss 
distribution in Section 2.2.2 that VaR is translation invariant, positive homogeneous 
and monotone on L°(Q, F, P). However, as the following example shows, the 
subadditivity property (Axiom 6.2) fails to hold for VaR in general, so VaR is not a 
coherent risk measure. 


Example 6.7 (VaR for a portfolio of defaultable bonds). Consider a portfolio of 
d = 100 defaultable corporate bonds. We assume that defaults of different bonds are 
independent; the default probability is identical for all bonds and is equal to 2%. The 
current price of the bonds is 100. If there is no default, a bond pays in t + 1 (one year 
from now, say) an amount of 105; otherwise there is no repayment. Hence L;, the 
loss of bond i, is equal to 100 when the bond defaults and to —5 otherwise. Denote by 
Y; the default indicator of firm i, i.e. Y; is equal to one if bondi defaults in [t, t + 1] 
and equal to zero otherwise. We get L; = 100Y; — 5(1 — Y;) = 105Y; — 5. Hence the 
L; form a sequence of iid rvs with P(L; = —5) = 0.98 and P(L; = 100) = 0.02. 

We compare two portfolios, both with current value equal to 10 000. Portfolio A 
is fully concentrated and consists of 100 units of bond one. Portfolio B is completely 
diversified: it consists of one unit of each of the bonds. Economic intuition suggests 
that portfolio B is less risky than portfolio A and hence should have a lower VaR. 
Let us compute VaR at a confidence level of 95% for both portfolios. 

For portfolio A the portfolio loss is given by La = 100L 1, so VaRg95(La4) = 
100 VaRo.95(L1). Now P(Lı < —5) = 0.98 > 0.95 and P(L; < D) = 0 < 0.95 
for l < —5. Hence VaRọo.95(L1) = —5, and therefore VaRo.95(L4) = —500. This 
means that even after a withdrawal of a risk capital of 500 the portfolio is still 
acceptable to a risk controller working with VaR at the 95% level. 
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For portfolio B we have 


100 100 
Lg => Li = 105 _ Y; — 500, 
i=l i=1 


and hence VaRy(Lg) = 105qa (X199 Yi) — 500. The sum M := 5-;°4 Y; has a 
binomial distribution M ~ B(100, 0.02). We get by inspection that P(M < 5) ~ 
0.984 > 0.95 and P(M < 4) % 0.949 < 0.95, so go.os(M) = 5. Hence 
VaRo.95(Lg) = 525 — 500 = 25. In this case a bank would need an additional 
risk capital of 25 to satisfy a regulator working with VaR at the 95% level. Clearly, 
the risk capital required for portfolio B is higher than for portfolio A. 

This illustrates that measuring risk with VaR can lead to nonsensical results. 
Moreover, our example shows that VaR is not subadditive in general. In fact, for any 
coherent risk measure o, which depends only on the distribution of L, we get 


100 100 
o( 5 Li) < J` o(Li) = 100@(L1) = Q(100L)). 
i=l i=1 
Hence any coherent risk measure, which depends only on the loss distribution, will 
lead to a higher risk-capital requirement for portfolio A than for portfolio B. 


In Example 6.7 the non-subbaditivity of VaR is caused by the fact that the assets 
making up the portfolio have very skewed loss distributions; such a situation can 
clearly occur if we have defaultable bonds or options in our portfolio. Note, however, 
that the assets in this example have an innocuous dependence structure because they 
are independent. We will see in Example 6.22 in Section 6.2 that non-subadditivity 
can also occur when the loss distributions of the individual assets are smooth and 
symmetric, but their dependence structure or copula is of a special, highly asym- 
metric form. Finally, non-subadditivity of VaR also occurs when the underlying rvs 
are independent but very heavy-tailed; see Example 7 in Embrechts, McNeil and 
Straumann (2002) and Example 5.2.7 in Denuit and Charpentier (2004), which both 
use infinite-mean Pareto risks. 

VaR is, however, subadditive in the idealized situation where all portfolios can 
be represented as linear combinations of the same set of underlying elliptically 
distributed risk factors. In this case both the marginal loss distributions of the risk 
factors and the copula possess strong symmetry. We have seen in Chapter 3 that 
an elliptical model may be a reasonable approximate model for various kinds of 
risk-factor data, such as stock or exchange-rate returns. 


Theorem 6.8 (subadditivity of VaR for elliptical risk factors). Suppose that X ~ 
Ea( n, X, Y) and define the set M of linearized portfolio losses of the form 
d 
M = [LiL =o+ y) u Ji eR}. 
i=l 


Then for any two losses L1, Lz E€ M and0.5 <a < 1, 
VaRq (Lı + L2) < VaRa (L1) + VaRa (L2). 
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Proof. Without any loss of generality we assume that Ag = 0. For any L € M it 
follows from Definition 3.26 that we can write L = a/X £ 4/AY +2! H for a spher- 
ical random vector Y ~ S;(y), a matrix A € R@** and a constant vector u € R°. 
By part (3) of Theorem 3.19 we have 


LE |vayy tie, (6.3) 


showing that every L € M is an rv of the same type. Moreover, the translation 
invariance and homogeneity of VaR imply that, for L = A' X, 


VaRa (L) = ||A'Al] VaRa (Y1) + A'u. (6.4) 


Now set Lı = AX and L2 = A}X. Since || (A1 + A2V'A]| < IA% Al] + ||A5 Al] and 
since VaRg (Y1) > 0 for œ > 0.5, the result follows. 


6.1.3. Coherent Risk Measures Based on Loss Distributions 


We give two examples of coherent risk measures that are based on loss distributions. 


Expected shortfall. A proof of the coherence of expected shortfall, defined in 
Definition 2.15, can be based on Lemma 2.20, which gives a representation of 
expected shortfall as the limit of the averages of upper order statistics. 


Proposition 6.9. Expected shortfall is a coherent risk measure. 


Proof. The translation invariance, positive homogeneity and monotonicity proper- 
ties follow easily from the representation ES, = (1/(1 — @)) J. i VaR, (L) du and 
the corresponding properties for quantiles. It remains to show subadditivity. 
Consider a generic sequence of rvs L1, ..., Ln with associated order statistics 
Lin 2 ++- > Ly» and note that for arbitrary m satisfying 1 < m < n we have 


m 
X Lin = sup{Z;, +---Lj,,: 1 [Kig <- < im <m}. 
i=1 
Now consider two rvs L and L with joint df F and a sequence of iid bivariate random 
vectors (Lj, Li), sees is La) with the same df F. Writing (L + L); = Li + Li 
and (L + DE for an order statistic of (L + L)1, wees (L+ Dn, we observe that 
we must have 
m 
XOA + Din = sup{(L + Da +H A Dini 1i < <im <m} 
i=l 


<S sup{Li, +--+ Lini: 1 < ip <- < im <m} 
ce 


+ sup{Li, + 
m m 
=} Lint} Lin 
i=1 i=l 


By setting m = [n(1 — @)] and letting n —> o, we infer from Lemma 2.20 that 
ESa(L + L) < ESg(L) + ESa (L). 


l1 
Li l<tj <- <i, <m} 


m * 
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A coherent premium principle. In Fischer (2003), a class of coherent risk measures 
closely resembling certain actuarial premium principles is proposed. These risk 
measures could be useful for an insurance company that wants to compute premiums 
on a coherent basis without deviating too far from standard actuarial practice. 

Given constants p > 1 and œ € [0, 1), this coherent premium principle Q{q, p] 
is defined as follows. Let M := LP (2, F, P), the space of all L with ||L]|, := 
E(|L|?)!/P < oo, and define, for L € M, 


Ola, p (L) = E(L) +al|(L — E(L))* lp. (6.5) 


Under (6.5) the risk of a loss L is measured by the sum of E (L), the actuarial value 
of a loss, and a risk loading given by a fraction a of the L?-norm of the positive 
part of the centred loss L — E(L). This loading can be written more explicitly as 
Grol — E(L))? dF, (1))!/?. The higher the values of œ and p, the more conser- 
vative the risk measure Q[a, p] becomes. 

The coherence of Q{«,p] is easy to check. Translation invariance and positive 
homogeneity are immediate. To prove subadditivity observe that for any two rvs X 
and Y we have (X + Y)* < Xt + Y”. Hence we get from Minkowski’s inequality 
(the triangle inequality for the L?-norm) for any two L1, L2 € M: 


(Ly — E(L1) + L2 — E(L2))* lp < I(L1 — E(L1))* + (L2 — E(L2))F lp 
<S Li — ECL)" lp + La — EL) Ip. 
which shows that Q[~,p] is subadditive. To verify monotonicity assume that L < 0 


almost surely; in that case we have (L— E(L))* < —E(L) almost surely, and hence 
|(L — E(L))T lp < —E(L), 80 Qfa,p) < O since a < 1. 


6.1.4 Coherent Risk Measures as Generalized Scenarios 


In this section we present a general class of coherent risk measures based on the idea 
of generalized scenarios; recall from Remark 2.9 that scenario-based risk measures 
are used in practice at the Chicago Mercantile Exchange. We show that if we restrict 
our attention to discrete probability spaces, then in fact all coherent risk measures 
belong to this class. It is possible to extend the idea to general (infinite) probability 
spaces but the results become somewhat more technical (see Notes and Comments 
for further references). 


Definition 6.10. Denote by P a set of probability measures on our underlying 
measurable space (2, F), and set Mp := {L: E2(\L}) < oo forall Q € P}. 
Then the risk measure induced by the set of generalized scenarios P is the mapping 
op : Mp — R such that op (L) := sup{ EL (L) : Q € P}. 
Proposition 6.11. 
(i) For any set P of probability measures on (2, F) the risk measure op is 
coherent on Mp. 


(ii) Suppose that 2 is a finite set {%1, ..., wq} and let M = {L : 2 — R}. 
Then, for any coherent risk measure ọ on M, there is a set P of probability 
measures on S2 such that 9 = Op. 
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Proof. The proof of (i) is straightforward. The properties of translation invariance, 
positive homogeneity and monotonicity follow easily from Definition 6.10. For 
subadditivity observe that 


sup{E2(L; + L1) : Q € P} = sup{ EL (L1) + E2(L2): Q € P} 
< sup{E2(L1): Q € P} + sup{ EL (L2) : Q € P}. 


The proof of (ii) is more technical and can be skipped by a reader interested mainly in 
applications. Essentially, the argument is an application of the separating hyperplane 
theorem for convex sets. 

We start with some notation. For! € R? we writel > 0ifl; > Oforall1 <i < d; 
by 1 € R? we denote the vector (1,..., 1)’. Since 2 is finite, we may identify M 
with Rf by associating an rv L with the vector 1 € R¢ with l; = L(œw;), 1 Si <d. 
Similarly, a linear functional A on R? with A (1) > 0 for all Z > 0 and A(1) = 1 can 
be identified with a probability measure P} on 2 via P} (œi) = à (ei), ei the ith unit 
vector. Below we will use these identifications freely. 

We have proved claim (ii) if we can show that for every rv Lo € M there is a 
probability measure Q = Q(Lo) such that 


E2(L) <ọ(L) forall L € Mand E2(Lo) = o(Lo). (6.6) 


In fact, in that case we may take P = {Q(Lo): Lo E€ M}. 

Now we turn to the proof of (6.6). If this relation holds for some Lo and some 
Q, it holds simultaneously for Q and all rvs of the form aLo + b,a € Rt, b € R 
(by translation invariance and positive homogeneity). We may therefore assume that 
o(Lo) = 1. Define U:= {L € M: Q(L) < 1}. As explained above we can identify 
U with a subset U C R®¢. The set U is open (as ọ is continuous) and convex 
(as @ is coherent and hence a convex functional on M); moreover, lo (the vector 
corresponding to the rv Lo) does not belong to U. Using the separating hyperplane 
theorem (see, for example, Rockafellar (1970) or Appendix B of Duffie (2001)) we 
conclude that there is a linear functional à on R? such that 


AQ) < Alo) foralll € U. (6.7) 


Since 0 € U, it follows that O = 4(0) < à (lo), and we may normalize A (Jp) to one. 
We now check that A induces a probability measure, i.e. that (a) A(J) > O0 for all 
I > 0 and (b) A(1) = 1. Note that we may write (6.7) as 


AQ) < 1 forall L such that ọ(L) < 1. (6.8) 


To prove (a) we use that for L < O we have ọ(L) < 0 and hence A(I) < 1. 
This implies that for L > 0 anda > 0 we get, using the linearity of à, aà (l) = 
—A(—al) > —1, and hence A(J) > —1/a. Letting a tend to oo yields (a). 

To prove (b) we first note that for any constant a < 1 we have o(a) = a < 1, and 
hence by (6.8) we have A(a1) < 1,soA(1) < 1. On the other hand, we get fora > 1 
that o(2Lo — a) = 20(Lo) — a = 2 —a < 1, hence 1 > A(2Ip — a) = 2 — ad (1), 
and therefore aA(1) > 1; this implies that A(1) > 1, and hence (b). 
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We now show that Q := P; is the desired probability measure. For this we need 
to verify (6.6), i.e. we have to show that E,(L) < o(L) for all L € M. This is 
equivalent to the implication ọ(L) < b > A) < b forall L € M, b € R. Now, 
by translation invariance, ọ(L) < b => o(L—- (b —- 1)) = ọ(L)+ 1-6) < 1. 
Hence we get from (6.8) that 1 > E (L — (b — 1)) = E, (L) — b + 1, and therefore 
E, (L) < b, as required. 


6.1.5 Mean-VaR Portfolio Optimization 


In this section we show what can happen if investors optimize the expected return 
on their portfolios under some constraint on VaR in a situation where VaR is not 
coherent—the portfolios resulting from such an optimization procedure exploit the 
conceptual weaknesses of VaR and lead to highly risky, non-diversified allocations. 
This is illustrated in the simplistic Example 6.12 below but we stress that the same 
phenomenon can be observed in more realistic situations (see Notes and Comments). 
At the end of this section we discuss again the idealized situation of linear portfolios 
of elliptical risk factors, where VaR is coherent and where mean variance portfolio 
optimization turns out to be equivalent to the standard Markowitz approach. 


Example 6.12. Consider in the context of Example 6.7 a portfolio manager who 
has an amount of capital V which can be invested in the d = 100 defaultable bonds 
with current price 100. For simplicity we assume that it is not possible to borrow 
additional money or to take short positions in the defaultable bonds. Denote by 
Ay := {A € R1: 50, D 100A; = V} the set of all admissible portfolios 
with value V at time t. The loss of some portfolio A € Ay will be denoted by L(A); 
the expected profit of a portfolio is thus given by E(—L(A)). We assume that the 
portfolio manager determines the portfolio using a mean-VaR optimality criterion, 
as follows. Given some risk-aversion coefficient 6 > 0, a portfolio A* is chosen in 
order to maximize 

E(—L(A)) — B VaRg(LQ)) (6.9) 


over all à € Ay. Portfolio optimization problems of the form (6.9) are frequently 
considered in practice. Moreover, optimization problems closely related to (6.9) do 
arise implicitly in the context of risk-adjusted performance measurement; often the 
performance of trading desks within a financial institution is measured by the ratio 
of profits earned by the desk and risk capital needed as a backup against losses 
from its operations. If this risk capital is determined using VaR, traders have similar 
incentives in choosing their portfolios as if operating directly under the simple 
criterion (6.9). 

Next we determine the optimal portfolio X*. Since the L; are identically dis- 
tributed, every admissible portfolio à € Ay has the same expected loss. Hence, 
maximizing (6.9) over all admissible portfolios amounts to minimizing VaR, (L) 
over Ay. Consider the case where a = 0.95. In order to minimize VaRy(L) we 
should invest all funds into one bond (for example the first), as was shown in Exam- 
ple 6.7. 

In our symmetric situation economic intuition suggests that the optimal portfolio 
should be given by a mixture of an investment in the riskless asset and a portfolio 
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consisting of an equal amount of each of the risky bonds. It can be shown that this is 
indeed the case, if we replace VaR by a coherent risk measure which depends only 
on the distribution of losses such as generalized expected shortfall (see Frey and 
McNeil (2002) for details). 


Portfolio optimization for elliptical risk factors. In the elliptical world, the use of 
any positive-homogeneous, translation-invariant measure of risk to rank risks or to 
determine the optimal risk-minimizing portfolio under the condition that a certain 
return is attained is equivalent to the Markowitz approach, where the variance is used 
as the risk measure. Alternative risk measures, such as VaR or expected shortfall, 
give different numerical values, but have no effect on the management of risk. We 
make these assertions more precise in the next proposition. 


Proposition 6.13. Suppose that X ~ Eq(mu, X, Y), with var(X;) < oo for alli. 
Denote by W = {w € R¢: EL q w; = 1} the set of portfolio weights. Assume that 
the current value of the portfolio is V and let L(w) = V Pei wiX;i be the (lin- 
earized) portfolio loss. Let ọ be a real-valued risk measure depending only on the 
distribution of a risk. Suppose © is positive homogeneous and translation invariant. 
Let& = {w € W : —w'u = m} be the subset of portfolios giving expected return m. 
Then argmin, -goe(L(w)) = argmin peg var(L(w)). 


Proof. Recall from the proof of Theorem 6.8 that for every w e & the loss 
L = L(w) is an rv of the same type, so o((L + mV)/./var(L)) = k for some 
constant k. From positive homogeneity and translation invariance it follows that 
o(L) = ky/var(L) — mV, from which it is clear that the Markowitz portfolio also 
minimizes Q. 


Notes and Comments 


The basic paper on coherent risk measures is Artzner et al. (1999); a non-technical 
introduction by the same authors is Artzner et al. (1997). Technical extensions such 
as the characterization of coherent risk measures on infinite probability spaces are 
given in Delbaen (2000, 2002). Example 6.7 is due to Albanese (1997) and Artzner 
etal. (1999). Different existing notions of expected shortfall are discussed in the very 
readable paper by Acerbi and Tasche (2002). Expected shortfall has been indepen- 
dently studied by Rockafellar and Uryasev (2000, 2002) under the name conditional 
Value-at-Risk; in particular, these papers show that expected shortfall can be obtained 
as the value of a convex optimization problem. 

The study of convex risk measures in the context of risk management and mathe- 
matical finance began with Follmer and Schied (2002) (see also Frittelli and Rosazza 
2002). A good treatment at advanced textbook level is given in Chapter 4 of Follmer 
and Schied (2004). Cont (2005) provides an interesting link between convex risk 
measures and model risk in the pricing of derivatives. 

Our exposition in Section 6.1.5 follows Frey and McNeil (2002) closely. Related 
portfolio optimization problems have been studied in Basak and Shapiro (2001), 
Krokhmal, Palmquist and Uryasev (2002) and Emmer, Kliippelberg and Korn 
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(2001). Risk-adjusted performance measures are widely used in industry in the con- 
text of capital budgeting and performance measurement. A good overview of current 
practice is given in Chapter 14 of Crouhy, Galai and Mark (2001); an analysis of 
risk management and capital budgeting for financial institutions from an economic 
viewpoint is Froot and Stein (1998). 

There is an extensive body of economic theory related to the use of elliptical 
distributions in finance. The papers by Owen and Rabinovitch (1983), Chamberlain 
(1983) and Berk (1997) provide an entry to the area. Landsman and Valdez (2003) 
discuss the explicit calculation of the quantity E(L | L > qa(L)) for portfolios of 
elliptically distributed risks. This coincides with expected shortfall for continuous 
loss distributions (see Proposition 2.16). 

There has been recent interest in the subject of multiperiod risk measures, which 
take into account the evolution of the final value of a position over several time 
periods and consider the effect of intermediate information and actions. Important 
papers in this area include Artzner et al. (2005), Riedel (2004) and Weber (2004). 


6.2 Bounds for Aggregate Risks 


In this section we consider the general problem of finding bounds for functionals of 
aggregate risks when marginal information about the individual risks is available. 
From a mathematical viewpoint this turns out to be a so-called Fréchet problem. 
We begin by presenting the general problem before concentrating on the problem 
of bounding the VaR of an aggregate risk. 


6.2.1 The General Fréchet Problem 


Consider a random vector L = (L1, ..., La)’, representing losses associated with 
various individual investments or risks, and a measurable function W : R? > R, 
representing the operation of aggregation. The rv W (L) is interpreted as an aggregate 
financial position and typical examples are 


e the total loss Sy = Bar Lk 

e the maximum loss Mg = max(L1,..., La); 

e the excess-of-loss treaty yii (Li — k;)* for thresholds k; € Rt; 
e the stop-loss treaty ($24; Li — k)+ for a threshold k € R+; and 


e a combined position Ma ltS4>qa}: 


All of these examples have an immediate interpretation in insurance and finance. 
For instance, in the context of credit risk, the last example might correspond to a 
basket position paying out the largest loss Mz, but only if the total loss Sg exceeds 
its -quantile qa (S4) for a close to one. 

Consider also a real-valued functional 9 depending on the distribution of ¥ (L); 
ọ can be interpreted as a risk measure, premium principle or pricing function. Ideally 
we would like to calculate o(W(L)), but, in order to do so, we need the df of ¥ (L) 
and hence the joint distribution of the random vector L. Often we are required to 
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work with much less information. Throughout Section 6.2 we assume that we know 
the marginal dfs of the risks L1, ..., La, we formalize this as Assumption (A1). 


(Al) The marginal dfs F; of Li, i = 1,...,d, are given. 


Of course, in practice, this really means that we have sufficient information con- 
cerning the marginal loss distributions that we can treat them as known. In the 
absence of additional information concerning the dependence of L),..., La we 
cannot calculate 9(W (L)), but we can look for numerical bounds on the risk subject 
to (Al). 

For a particular W and ọ the problem thus consists of finding lower and upper 
bounds Omin and Omax such that, under (A1), 


Omin < E(W(L)) < Omax- (6.10) 


We would like these bounds to be sharp, meaning that narrower bounds would be 
violated by some random vector L whose distribution is consistent with (A1). When 
W (L) represents the aggregate loss of a financial position and ọ represents a risk 
measure, the analysis of this problem can be thought of as a stress-testing exercise 
for risk measures with respect to the dependence structure of the individual risks 
involved. The value Qmax represents the worst possible “riskiness” of the position. 

The problem has a very rich history in the field of probability, where it typically 
appears under the name Fréchet problem. Indeed, its mathematics is intimately 
related to the Fréchet bounds given in Theorem 5.7 and Remark 5.8. We shall sketch 
a solution to the problem, give some examples and, in Notes and Comments, guide 
the interested reader to the existing literature for more details. 

The problem of finding the bounds in (6.10) assuming (A1) only can be reformu- 
lated as a pair of optimization problems. We are required to calculate 


inf{o(W(L)):L; ~ Fj, i=1,...,d} TER 
sup{o(W(L)): Li ~ Fi, i=1,...,d} , 
where F1, ..., Fg are given dfs and L; ~ F; means that L; has df F;. The solutions 


can be found analytically in some cases, but there also exist various numerical 
techniques to solve the problems in general. 

We have already encountered problems of the form (6.11) in our analysis of 
attainable correlations (see Hoffding’s Theorem (Theorem 5.25)), and we revisit 
this problem briefly. 


Example 6.14 (attainable correlations). Assume without loss of generality that 
we have two risks which are standardized to have mean zero and variance one. The 
problem of finding maximum and minimum correlations for fixed margins can be 
formulated as a Fréchet problem in two dimensions, where ¥ (L1, L2) = LiL2 
and o(¥ (L1, L2)) = E(¥ (Lı, L2)) = p(L1, L2), the linear correlation coefficient 
between L, and L2. 

Theorem 5.25 shows that the possible range of the correlations between L; and 
L» over all possible bivariate models for the vector (L1, L2) is a closed interval 


250 6. Aggregate Risk 


[Pmin, Pmax] C [—1, 1], where possibly pmin > —1 and/or Pmax < 1. An example 
where the margins were taken to be lognormal and for which pmin > —1 and 
Pmax < 1 was given in Example 5.26. Furthermore, we showed that the bound- 
ary cases Pmin and pmax are attained for countermonotonic and comonotonic risks, 
respectively; this result is crucial for our discussion below. The case pmax = 1 can 
only occur when L; and L3 are rvs of the same type, and the case pmin = —1 can 
only occur when L; and — L3 are rvs of the same type (see Definition A.1). 


Because of Sklar’s Theorem (Theorem 5.3), the inf and sup in (6.11) can be 
interpreted as being taken over all copulas C on [0, 1]“. In some situations we may 
have some information concerning the dependence structure of L, and it is natural to 
translate this dependence information into constraints on C; for instance, we might 
take inf and sup over all copulas C > Co, for some fixed copula Co. We discuss 
specific examples below. 


6.2.2 The Case of VaR 


In this section we show the type of results that are obtained in the case when 
o = VaR,. We want to find (sharp) bounds for VaRa(¥ (L)) given the marginal 
dfs F; of Lj, i = 1,...,d, and partial information on the dependence of the L; 
variables, in particular when W is the sum operator. For the interpretation of the 
results it will be useful to first consider the behaviour of the VaR risk measure for 
comonotonic risks as defined in Section 5.1.6. 


Additivity of VaR for comonotonic risks. The following result summarizes addi- 
tivity of VaR. 


Proposition 6.15. Let0 < œ < 1 and L,,...,Lq be comonotonic rvs with dfs 
F,,..., Fg which are continuous and strictly increasing. Then 


VaRg(L1 +--+ La) = VaRa (L1) +--+ + VaRg (La). (6.12) 


Proof: For ease of notation take d = 2. From Proposition 5.16 we have that 
(Li, L2) 2 (FÉ (U), Fx (U)) for some U ~ U (0, 1). It follows that 
VaRy (Li + L2) = VaRa (FÉ (U) + Fý (U)) = Fr) (a), 


where T is the strictly increasing continuous function given by T(x) = Fy (x) + 
Fy (x). Now P(T(U) < T(a)) = P(U < a@) = a, so the result follows by 
observing that 


Fey) (@) = T(a) = F(a) + FS (œ) = VaRa(Li) + VaRa (L2). 


Remark 6.16 (extensions). A more general form of the above result can be found 
in Embrechts, Hoing and Juri (2003) and is as follows. Let W : RI > R be 
increasing and left-continuous in each argument, 0 < œ < 1, and let L4, ..., La be 
comonotonic rvs (not necessarily with continuous, strictly increasing dfs). Then 


VaRy(W(L1,..., La)) = Y (VaRa (L1), .. ., VaRy(La)). 
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A third correlation fallacy. Based on the above result we can highlight a third 
important fallacy concerning correlation to add to the two in Section 5.2.1. 


Fallacy 3. VaR for the sum of two risks is at its worst when these two risks have 
maximal correlation, i.e. are comonotonic. 


Any superadditive VaR example yields a correction to this statement; one such 
case was shown in Example 6.7 and a further one is given below in Example 6.22. In 
a superadditive VaR situation we have VaRy(L; + L2) > VaRa (L1) + VaRa (L2) 
for two risks Lı and L2 and some confidence level a. By Proposition 6.15 and 
Remark 6.16 the right-hand side VaR, (L1) + VaRg(L2) corresponds to the VaR of 
Lı + L2 when Lı and L2 are comonotonic. Moreover, Theorem 5.25 and Exam- 
ple 6.14 imply that the correlation of Lı and L2 is maximal in the comonotonic case. 
Hence the superadditive portfolio case must correspond to a smaller correlation. The 
remainder of Section 6.2 is devoted to the issue of finding the worst case. 


Remark 6.17. For expected shortfall the expression 9(L; + L2) is maximized for 
comonotonic losses. To see this, note that Proposition 6.15 together with (2.23) 
imply that expected shortfall also has the comonotonic additivity property. Since 
expected shortfall is coherent, we have e(L; + L2) < e(L1) + e(L2), so that 
comonotonicity is in fact the worst possible case. There exists a whole class of 
coherent risk measures, known as spectral risk measures, which share this property 
(see Notes and Comments). Note, also, that if we work with VaR but restrict our 
attention to elliptical distributions for the vector L, then VaR is a coherent risk 
measure (Theorem 6.8). Fallacy 3 is taken out of play and comonotonicity does 
correspond to the worst case. 


Restrictions on dependence using copulas. Before discussing bounds on VaR we 
need to formalize the restrictions we make on the dependence structure of the df 
F of L. Recall that in the case of continuous marginal dfs Fj, there is a unique 


copula C such that F = C(F|,..., F4), and one possibility is to impose dependence 
restrictions on Lj, ..., Lg through conditions on C. Recall from Theorem 5.7 that 
W < C < M, where W and M denote the Fréchet lower and upper bounds, 
respectively. 


We introduce dependence restrictions of the following type. 
(A2) C > Co for a copula Co. 


When d = 2 the case of unconstrained optimization can be treated as a special case of 
restriction (A2) by setting Co = W, since W is a proper copula in this case; however, 
for d > 2 unconstrained optimization is not a special case of restriction (A2). 
The case where Co = IT, the independence copula in (5.6), corresponds to so- 
called positive lower orthant dependence (PLOD) (see Müller and Stoyan 2002, 
Definition 3.10.1). In Theorem 3.10.4 of Muller and Stoyan (2002), it is shown that, 
if cov(f(L), g(L)) > 0 for all increasing functions f, g : R? —> R, then L is 
PLOD. 

Note that the relation “>” in (A2) is not a complete ordering on the space of all 
copulas, meaning that for any two copulas C and C3 it is not necessarily true that 
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either C1 > C2 or C2 > C1. As a consequence, a constraint of the type (A2) may 
only give a restrictive view on dependence alternatives. 


Notation for the optimization problem. In order to formulate some of the key results 
for the optimization problem (6.11), we need some extra notation. Given a vector 
x = (x1,...,x¢)' € RÍ, we write x-i = (X1, ..-, Xi—1, Xi41,--., Xd)". Also, for 
x_q € RI! fixed, we define ve (s) := sup{xg E€ R: W(x_g, xg) < s} fors € R. 
In our set-up, it is convenient to identify the df F of L given fixed margins with the 
copula C that combines the margins to give the df C (F1, ..., Fa). Denote by uc 
the corresponding probability measure on R? and define, for s € R, 


oc,y (Fi, ..., Fa)(s) := uc (¥ (L) < s), 


tcy (Fi, ..., Fa)(s):= sup — C(Fi&i),..., Fa-1@a-1), Fy Ao), 
X1,- Xd-1ER 


where F} (x) stands for the left limit of Fy in x. It follows that 


my (s) := inf{P(W(L) < s): Li) ~ Fj, i=1,...,d} 
= inffoc w(Fi,..., Fa)(s) : C € Ca}, 


where Cq denotes the set of all d-dimensional copulas. 
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Remark 6.18. The strict inequality “<” in the definition of my (s) is essential (see 
Embrechts and Puccetti 2005, Remark 3.1(ii)). 


Optimization subject to proper copula constraints. It turns out that a proper lower 
copula constraint as in (A2) allows for an easier analysis. Recall that the uncon- 
strained case C > W is a special case of (A2) only if d = 2. 


Theorem 6.19 (lower bound with partial information). Let L be a random vector 
in R (d > 2) having margins F\,..., Fq and copula C. Assume that there exists 
a copula Co such that C > Co (i.e. Assumption (A2) holds). If YW : R —> R is 
increasing, then, fors € R, 


ocw(Fi,..., Fa)(s) 2 teow, .--, Fa)(s). (6.13) 
If, moreover, W is right-continuous in its last argument, then the copula 


max(t,Co(u)), u € [t, 1], 
C, (u) := ; ; 
min{u,,...,ug}, otherwise, 


where t = TCo,y (Fi, ..., Fa)(s) attains the bound in (6.13). 


Proof. See Theorems 3.1 and 3.2 in Embrechts and Puccetti (2005). 


Translated into the language of VaR and using the notation VaRg max := 
TCo,w (Fi, ..., Fa)“ (a) for the inverse of the t function in (6.13), Theorem 6.19 
becomes 

VaRa(W(E)) < VaRa,max; (6.14) 
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for 0 < æ < 1, which gives an upper bound of the kind in (6.10). If W is given by 
the sum operator, abbreviated to W = +, this bound is 

VaRo.max = inf (FÉ (ui) +--+ + Fý (ua)). (6.15) 

ue[0,1]4,Co(u)=a 
The unconstrained case. The unconstrained case for d > 2 is more difficult. First 
of all, the standard bound (6.13) evaluated at Co = W still holds but may fail to 
be sharp. For W = + and Fj = --- = Fy = F with F a continuous df on RF, it 
reduces to 
tw(F,..., F)\(s) = (dF (s/d) — d + 1)* (6.16) 

for large enough s (see Embrechts and Puccetti (2005) for details). The next result 
yields a better bound. 


Theorem 6.20 (a better bound in the unconstrained case). Let F be a continuous 
dfonR* and let F; = --- = Fy = F. Then, for alls > 0 and F = 1 — F, 
i pes lr F(x) dx 
mi(s)21l—-d inf = f 
re[0,s/d] s— dr 


Proof. See Theorem 4.2 in Embrechts and Puccetti (2005). 


(6.17) 


Remark 6.21. The value of m(s) can be closely approximated by solving two 
linear programmes (see Embrechts and Puccetti 2005; Embrechts, Hoing and Juri 
2003). 


Examples. Ina first example we consider the special, though important, case when 
F, = F = @, the standard normal df. The second example considers higher- 
dimensional portfolios with Pareto margins. 


Example 6.22 (worst VaR for a portfolio with normal margins). For i = 1,2 
let F; = @. In Figure 6.1 we have plotted the worst VaRy(L; + L2) calcu- 
lated using (6.16) as a function of œ together with the curve corresponding to 
the comonotonic case calculated using Proposition 6.15. The fact that the for- 
mer lies above the latter implies the existence of portfolios with normal mar- 
gins for which VaR is not subadditive. For example, for a = 0.95, the upper 
bound is 3.92, whereas VaRy(Li) = 1.645, so, for the worst VaR portfolio, 
VaRo.95(L1 + L2) = 3.92 > 3.29 = VaRogo95(L1) + VaRo.95(L2). The worst- 
case copula is shown in Figure 6.2 (see Embrechts, Hoing and Puccetti (2005) for 
further details). 


As explained in Theorem 6.20, the case d > 3 is more subtle, as the standard 
bound (6.15) fails to be sharp. The strictly lower bound (6.17) in the case of identical 
distributions can be computed easily. In Section 10.1.4 we will show that operational 
risk losses can be modelled reasonably well by heavy-tailed Pareto distributions with 
infinite variance. In the case of operational risk one faces the calculation of VaRs 
at the 99% (or even higher) level across numerous (up to 56) classes of risk. The 
dependence between the loss rvs for these classes is mostly unknown, so we face 
the above unconstrained optimization problem for VaRg(L; + -- -+ La). The next 
example contains some calculations for Pareto portfolios. 
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Figure 6.1. The worst-case VaRg (solid line) plotted against œ for two standard normal 
risks; the case of comonotonic risks (dotted line) is shown as a comparison. 
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Figure 6.2. Contour and perspective plots of the density function of the distribution of 
(L1, L2) leading to the worst-case VaR for Lı + L2 at the æ = 0.95 level when the L; are 
standard normal. 


Example 6.23 (VaR bounds for Pareto portfolios). Suppose that L; ~ Pa(1.5, 1) 
fori = 1,...,d so that E(L;) = 2 and VaR(L;) = oo. In the unconstrained case, 
Table 6.1 contains the bounds obtained from Theorem 6.20 (which, for reasons we 
will not discuss, are known as dual bounds). The portfolio sizes 8 and 56 have been 
chosen with the operational risk problem in mind, as explained above, whereas 100 
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Table 6.1. Bounds for VaRg (ee 1 Li) for portfolios of Pa(1.5, 1)-distributed risks are 
given in columns marked “dual”; columns marked “com” give values in the comonotonic 
case. Numbers are expressed in thousands. 


VaRa (S84 Li) VaRa (35) Li) VaR E L) VaRa E L,) 
oM“_acOaOo[— ooo eo ee oeklao—__ 
a com dual com dual com dual com dual 


0.90 0.03 0.08 0.20 0.67 0.36 1.23 3.64 12.73 
0.95 0.05 0.14 0.36 1.10 0.64 2.00 6.37 20.77 
0.99 0.16 0.41 1.15 3.32- = 2.05 6.05 20.54 62.66 
0.999 0.79 1.93 5.54 15.63 9.90 28.43 99.00 294.47 


and 1000 could represent the sizes of typical credit portfolios. The assumption of a 
single common Pareto distribution for all individual losses is of course a simplifi- 
cation for computational purposes. 


Notes and Comments 


There is a large literature on Fréchet problems. Our discussion is mainly based 
on Embrechts, Hoing and Juri (2003), Embrechts, Hoing and Puccetti (2005) 
and Embrechts and Puccetti (2005). These papers also contain the most important 
references to the existing literature. Historically, the question of bounding the df of 
a sum of rvs with given marginals goes back to Kolmogorov and was answered by 
Makarov (1981) for d = 2. Frank, Nelsen and Schweizer (1987) restated Makarov’s 
result using the notion of a copula. Independently, Ruschendorf (1982) gave a very 
elegant proof of the same result using duality. Williamson and Downs (1990) intro- 
duced the use of dependence information. 

Fallacy 3 originally appeared in Embrechts, McNeil and Straumann (2002); it 
ceases to be a fallacy if we replace VaR by expected shortfall or a spectral risk 
measure. For spectral risk measures see Kusuoka (2001), Acerbi (2002) and Tasche 
(2002). A closely related class of risk measures mainly used in insurance applications 
is referred to as distortion or Wang measures (see Wang 1996). A nice discussion is 
to be found in Denuit and Charpentier (2004). 

Embrechts, Hoing and Juri (2003) gave the most general theorem for general d 
and W; their main result on the sharpness of the bounds for d > 3 and no constraints, 
however, contains an error: this was corrected in Embrechts and Puccetti (2005). For 
the construction of the copula(s) leading to the worst VaR, see Embrechts, Hoing 
and Puccetti (2005). Numerous other authors (especially in analysis and actuarial 
mathematics) have contributed to this area and we refer to the above papers for 
references. Besides the comprehensive book by Miller and Stoyan (2002), several 
other texts in actuarial mathematics contain interesting contributions on dependence 
modelling (see, for example, Chapter 10 in Kaas et al. (2001) for a start). A rich set 
of optimization problems within an actuarial context are to be found in De Vylder 
(1996): see especially “Part II: Optimization Theory”, where the author “shows how 
to obtain best upper and lower bounds on functionals T(F) of the df F of a risk, 
under moment or other integral constraints”. An excellent account is to be found 
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in Denuit and Charpentier (2004). The definitive account from an actuarial point of 
view is Denuit et al. (2005). 

Rosenberg and Schuermann (2004) give some idea of the applicability of aggrega- 
tion ideas used in this chapter. They construct the joint risk distribution for a typical, 
large, internationally active bank using the method of copulas and aggregate risk 
measures across the categories of market, credit and operational risk. 


6.3 Capital Allocation 
6.3.1 The Allocation Problem 


Consider an investor who can invest in a fixed set of d different investment possibili- 
ties with losses represented by the rvs L1,..., La. We have the following economic 
interpretations depending on the area of application. 


Performance measurement. Here the investor is a financial institution and the L; 
represent the (negative of the) P&L of d different lines of business. 


Loan pricing. Here the investor is a loan book manager responsible for a portfolio 
of d loans. 


General investment. Here we consider either an individual or institutional investor 
and the standard interpretation that the L; are (negative) P&Ls corresponding to 
a set of investments in various assets. 


The performance of the different business units or investments is usually mea- 
sured using some sort of RORAC (return on risk-adjusted capital) approach, i.e. by 
considering a ratio of the form 


expected profit/risk capital, (6.18) 


where we leave the precise definition of the terms vague. In many applications risk 
capital might correspond to economic capital: the capital derived by considering the 
fluctuation of the loss around the expected loss (the unexpected loss), rather than 
the absolute loss. Similarly, in a modern approach to loan pricing, the spread of a 
loan contains a risk premium component, which is computed by applying a target 
interest rate to the risk capital needed to sustain an individual loan (see Section 9.3.4 
for details). 

Obviously the general approach embodied in (6.18) raises the question of what the 
appropriate risk capital for an individual investment opportunity might be. Thus the 
question of performance of the investment is intimately connected with the subject 
of risk measurement as addressed in Sections 2.2 and 6.1. A two-step procedure is 
used in practice. 


(1) Compute the overall risk capital 0(L), where L = 4 Li and @ is a par- 
ticular risk measure such as VaR or ES; note that at this stage we are not 
stipulating that ọ must be coherent. 


6.3. Capital Allocation 257 


(2) Allocate the capital ọ(L) to the individual investment possibilities according 
to some mathematical capital allocation principle such that, if AC; denotes 
the capital allocated to the investment with potential loss L;, the sum of the 
allocated amounts corresponds to the overall risk capital ọ(L). 


In this section we are interested in step (2) of the procedure; loosely speaking we 


require a mapping that takes as input the individual losses L),..., Lg and the risk 
measure ọ and yields as output the vector (AC), ..., AC) such that 
d 
e(L) = >> AG, (6.19) 
i=l 


and such a mapping will be called a capital allocation principle. The relation (6.19) 
is sometimes called the full allocation property since all of the overall risk capital 
o(L) (not more, not less) is allocated to the investment possibilities; we consider this 
property to be an integral part of the definition of an allocation principle. Of course, 
there are other properties of a capital allocation principle that are desirable from an 
economic viewpoint; we first make some formal definitions and give examples of 
allocation properties before discussing further properties. 


The formal set-up. Let Lı,..., Lq be rvs on a common probability space 
(2, F, P) representing losses (or profits) for d investment possibilities. For our 
discussion it will be useful to consider portfolios where the weights of the individual 
investment opportunities are varied with respect to our basic portfolio (Z1,..., La), 
which is regarded as a fixed random vector. That is, we consider an open set 
A C R \ {0} of portfolio weights and define for A € A the loss L(A) = yo Aj Li; 
the loss of our actual portfolio is of course L (1). Let ọ be some risk measure defined 
on a set M which contains the rvs {L(A) : à € A}. We then define the associated 
risk-measure function ro : A —> R by rg(A) = @(L(A)). Thus rg (A) is the required 
risk capital for a position à in the set of investment possibilities. 


Definition 6.24. Let rọ be a risk-measure function on some set A C R? \ {0} 
such that 1 € A. A mapping ze : A > R? is called a per-unit capital allocation 
principle associated with rg if, for all à € A, we have 


d 
Sie A) = roA). (6.20) 
i=1 


The interpretation of this definition is that zr oe gives the amount of capital allocated 
to one unit of L;, when the overall position has loss L(A). The amount of capital 
allocated to the position å; L; is thus A; 7 and the equality (6.20) simply means that 
the overall risk capital rg (À) is fully allocated to the individual portfolio positions. 


6.3.2 The Euler Principle and Examples 


From now on we restrict our attention to risk measures that are positive homogeneous 
(satisfying Axiom 6.3 in Section 6.1.1), such as a coherent risk measure, but also 
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the standard deviation risk measure or VaR. Obviously the associated risk-measure 
function must satisfy rọ (tà) = trp(A) for allt > 0,4 € A, so rọ : A > Ris 
a positive-homogeneous function of a vector argument. Recall Euler’s well-known 
rule that states that if rg is positive homogeneous and differentiable at A € A, we 
have 


d 
a 
rÀ) = 452). (6.21) 
i=l z 


Comparison of (6.21) with (6.20) suggests the following definition. 


Definition 6.25 (Euler capital allocation principle). If r, is a positive-homo- 
geneous risk-measure function, which is differentiable on the set A, then the per-unit 
Euler capital allocation principle associated with rọ is the mapping 


Org 
dA; 

The Euler principle is sometimes called allocation by the gradient, since 
ze (À) = Vro(A). Obviously the Euler principle gives a full allocation of the risk 
capital. We now look at a number of concrete examples of Euler allocations corres- 
ponding to different choices of risk measure o. 


me: A> RI, me A) = LA). (6.22) 


Standard deviation and the covariance principle. Consider the risk measure func- 
tion rsp(A) = y var(L(A)) and write X for the covariance matrix of (L1, ..., La). 
Then we have rsp(A) = (A’A)!/2, from which it follows that 

arsp (5d); Xf cov(Li, Lj)Aj — cov(Li, L@)) 


SD (À) = à) = = = ; 
TAN dA; on rsp() rsp (A) vvar(L (A) 


In particular, for the original portfolio of investment possibilities corresponding to 
A = 1, the capital allocated to the ith investment possibility is 


cov(L;, L) 
/var(L) ` 


This formula is known as the covariance principle. 


AC; = TP (d) = L := L(1). (6.23) 


VaR and VaR contributions. Suppose that rg R (À) = qa (L(à)). In this case it can 
be shown that, subject to technical conditions, 


TROA) = ie (A) = E(Li | L) = qa (LA), 1<i<d. (6.24) 


The derivation of (6.24) is more involved than that of the covariance principle and 
we give a justification following Tasche (2000) under the simplifying assumption 


that the loss distribution of (L1, . . ., La) has a joint density. In the following lemma 
we denote by ġ (u, l2, ..., la) = fLilL2,...,La (U | l2, ..., la) the conditional density 
of Lı. 

Lemma 6.26. Assume that d > 2 and that (L1, ..., Lq) has a joint density. Then, 


for any vector (à1, ..., àq) of portfolio weights such that X, 4 0, we find that 
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(i) L(A) has density 


d 
feat) = parte (o(ar"( = Lii) La, aiita); 
j=2 


and 


(ii) fori = 2,...,d, 


ELi OT E — Yl AGL) Los... L 
Mie ee ag 
EOT E — Dyan AjLj), La,- La)) 


Proof. For (i) consider the case 4; > 0 and observe that we can write 


P(LQA) < t) = E(P (LA) < t | L2,..., La)) 


d 
= e(e(1 < art(: - Daur) | ianua) 
j=2 


Ag '-Dja2 Lj) 
=a( f (La. La) du). 


—c 


The assertion follows on differentiating under the expectation. 
For (ii) observe that we can write 


6 E(Li1, 0/atyE(Lil 
E(L; | LQ) =t) = lim (Lilt <ta)<t+s}) 2 (0/dt)E(L; (Last) 
s>0 6-! P(t < L(A) < t +ô) Fra) 


provided fLa)(t) 4 0. The result follows on applying a similar conditioning tech- 
nique to the ones used in the proof of (i) to the numerator. 


We now explain why (6.24) follows from Lemma 6.26. Since the rv L(A) has a 
density, we have P(L(A) < qa(L(à))) = œ. Writing k(t) = ar — SEAE) 
we have 
K (War (A) 


a = P(L(A) < reg(d)) = e( f 


=00 


o(u, L2,..., La) au), (6.25) 


We take derivatives of (6.25) with respect to A; fori = 2,...,d to get 


a 
0= ate ( (Ae = Li JOKR), E La); 
L 

Solving this expression for dry,p(4)/0A; and using part (ii) of Lemma 6.26 
yields (6.24), as desired. Analogous calculations can be done for i = 1 and A; < 0. 
Tasche (2000) makes the derivations mathematically rigorous by using the implicit 
function theorem and giving all necessary conditions. In summary, the capital allo- 
cation takes the form AC; = E(L; | L = VaRa(L)), L := L(1). 


260 6. Aggregate Risk 


Expected shortfall and shortfall contributions. Now consider using the risk- 
measure function rpg(A) = E(L | L > qg(L(A))) corresponding to expected 
shortfall. It follows from Definition 2.15 that we can write 


1 1 
res (A) = =S Tyan (A) du, 
a 


where we make use of the notation rR (à) = ga(L(A)) as above. We apply the 
Euler principle by again computing the derivative with respect to A;. Assuming the 
differentiability of rý, (A), we have 


Orgs 1 l OWaR 1 1 
Day A) = =S Ja; (A) du = =S E(Li | LA) = qu(LQ))) du. 


Now we assume that fra) is strictly positive so that the df of L (À) has a differentiable 
inverse and we can make the change of variables v = qu (L(à)) = Fray (u). Since 
dv/du = (fray(v))~!, we get 

or 1 


a OO 
ES 
Q) = i E(L; | LQ) = v) fra (0) dv 
dÀi Le Jua a 


1 
= y7 ECE EQ) > qa (LA). 
=W 


This gives a capital allocation of the form 
AC; = E(Li | L > VaRa(L)), L:=L(1), (6.26) 


where AC; is known as the expected shortfall contribution of investment possibility 
(or line of business) i. This is a popular allocation principle in practice, and is 
generally considered to be preferable to the covariance principle and the principle 
based on VaR contributions. See Notes and Comments for literature on its use in 
practice in the context of credit portfolios. 


Euler allocation for elliptical loss distributions. In the following corollary to The- 
orem 6.8 we consider the special case of an elliptical loss distribution for the vector 
of investment opportunities (L;,..., La). We consider this distribution to be cen- 
tred at zero so that it really represents fluctuations of the loss around the expected 
loss. We find that the relative amounts of capital allocated to each investment oppor- 
tunity are always the same, regardless of whether we base an Euler allocation on the 
standard deviation, VaR or expected shortfall risk measures, or indeed any positive- 
homogeneous risk measure. Thus allocation is very simple in this case: depending 
on our choice of risk measure we calculate the total risk capital to be allocated and 
then use a simple partitioning formula given in (6.27) below. 


Corollary 6.27. Assume thatro : A — R is the risk-measure function of a positive- 
homogeneous risk measure ọ depending only on the distribution of the loss. Let 
L ~ Eq(O, X, Y). Then, under an Euler allocation, the relative capital allocation is 
given by 
ACG: mO fe Dik 
AC; x?) YES ee 


l<i,j<d. (6.27) 
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Proof. From the proof of Theorem 6.8 we deduce that, by the positive homogeneity 
of the risk measure, we have 


d 
roA) = O(L(A)) = o( ZuL) = V2 E20(%1), 


i=l 


where Yj is the first component of a spherical random vector with characteristic 
generator y. For the allocation we get 


X 
m” (A) = Vro(A) = Do Taha 


from which the result follows. 


6.3.3 Economic Justification of the Euler Principle 


Signals for performance measurement. A first economic justification for capital 
allocation based on the Euler principle was given by Tasche (1999), who addressed 
the issue of whether it gave “the right signals for investment decisions”. He formal- 
ized the idea as follows. 


Definition 6.28. Let rọ be a risk-measure function which is differentiable on A 
and x”? an associated per-unit capital allocation principle. Then zr’ is suitable for 
performance measurement if, for all à € A, we have 
o ELi) _ ELA) 
>0, if — > 

a (=) m,° (A) To(À) 
dhi \ Te (A) 0 SEY | TECA) 
m° (A) ro (A) 


, if 


In words, this says that if the performance of investment opportunity i as measured 
by its per-unit return divided by per-unit risk capital i is better (respectively, 
worse) than the performance of the overall portfolio, then increasing (respectively, 
decreasing) the weight A; of that investment opportunity by a small amount improves 
the overall performance of the portfolio. Tasche then proves the following result, 
for the proof of which we refer to the original paper. 


Proposition 6.29. Under the assumptions of Definition 6.28, the only per-unit capi- 
tal allocation principle suitable for performance measurement is the Euler principle. 


Fairness considerations. Another justification for the Euler principle was given 
by Denault (2001). His approach uses cooperative game theory and is based on 
the notion of “fairness”. Assume that the risk-measure function rg derives from a 
coherent risk measure ọ. In that case, since @(L) < Ee o(L;), the overall risk 
capital required for the portfolio is smaller than the sum of the risk capital required 
for the business units on a stand-alone basis. Fairness now means that each business 
unit profits from this diversification benefit, in the sense that AC; < o(Z;). In the 
next definition we slightly extend this intuitive notion of fairness. 
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Definition 6.30. Given a coherent risk measure ọ with associated risk-measure 
function rg, a per-unit capital allocation principle z'e is said to be fair if, for all 
à € Aandall y € [0, 1], the following inequality holds: 


d 
XO vidi? (A) < ro (vial, -> Yaha). (6.28) 
i=l 


Note that, by the definition of a per-unit capital allocation principle in (6.20), we 
have equality in (6.28) if we take y = 1. The economic interpretation of (6.28) 
is straightforward for a vector y € {0,1}? satisfying y; = Kien}, where N C 
{1,..., d} is a subset of the investment opportunities. In that case the left-hand side 
of (6.28) gives the combined capital that is allocated to the investment opportunities 
in the set N given that the overall portfolio is represented by the vector à with 
loss L(A) = yes A; Li. The right-hand side is the combined capital allocated to 
the opportunities in the set N on a stand-alone basis, i.e. in a portfolio with no 
investments in the opportunities N° := {1, ..., d}\ N and loss given by J jey Ai Li. 

Since ọ is coherent and, in particular, subadditive, we have 


(Zat) < o( Dats) +e( x uL). 


ieN ieNS 
which essentially says that the investments in N enjoy a diversification benefit by 
being part of the overall portfolio represented by à. Fairness suggests that they 
should profit from this benefit by being allocated a smaller amount of capital than 
they would have on a stand-alone basis; this is exactly the content of (6.28). 

The interpretation of (6.28) for general y € (0, 1] is more involved, but per- 
haps easiest if we use the interpretation that the L; represent losses for different 
lines of business. We introduce the portfolio i= (V1A1,---,; Yada)’ and note that 
it represents a scaling back of activity across the firm with respect to the original 
portfolio à. We can rewrite (6.28) as 


d d 
odin? (A) < Y Aim? Ô). 
i=l i=l 


The left-hand side represents the overall capital allocated to the scaled-back portfolio 
considered as part of the original portfolio. The right-hand side represents the overall 
capital allocated to the scaled-back portfolio considered as a stand-alone entity. If 
the inequality were the other way round, there would be a systematic incentive for 
business units to scale back their activities. 

Translating a game-theoretical result of Aubin (1979) into the context of capital 
allocation with a coherent risk measure, Denault (2001) shows that for a differen- 
tiable risk-measure function rg that is derived from a coherent risk measure ọ, the 
only fair allocation principle is the Euler principle. Obviously, this gives additional 
support for using the Euler principle if one works in the realm of coherent risk 
measures. 
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From a practical point of view, the use of expected shortfall and expected shortfall 
contributions might be a reasonable choice in many application areas, particularly 
for credit risk management and loan pricing (see Notes and Comments, where this 
issue is discussed further). 


Notes and Comments 


A broad, non-technical discussion of capital allocation and performance measure- 
ment is to be found in Matten (2000). The term “Euler principle” seems to have 
been first used in Patrik, Bernegger and Rüegg (1999). The result (6.24) is found in 
Gourieroux and Scaillet (2000) and Tasche (2000); the former paper assumes that the 
losses have a joint density and the latter gives a slightly more general result as well 
as technical details concerning the differentiability of the VaR and ES risk measures 
with respect to the portfolio composition. Differentiability of the coherent premium 
principle of Section 6.1.3 is discussed in Fischer (2003). The derivation of allocation 
principles from properties of risk measures is also to be found in Goovaerts, Dhaene 
and Kaas (2003) and Goovaerts, van den Boor and Laeven (2005). 

For the arguments concerning suitability of risk measures for performance mea- 
surement, see Tasche (1999). The game-theoretic approach to allocation is found 
in Denault (2001); see also Kalkbrener (2005) for similar arguments. For an early 
contribution on game theory applied to cost allocation in an insurance context, see 
Lemaire (1984). 

Applications to credit risk are found in Kalkbrener, Lotter and Overbeck (2004) 
and Merino and Nyfeler (2003); these make strong arguments in favour of the use of 
expected shortfall contributions. However, Pfeifer (2004) contains some compelling 
examples to show that expected shortfall as a risk measure and expected shortfall 
contributions as an allocation method may have some serious deficiencies when 
used in non-life insurance. The existence of rare, extreme events may lead to absurd 
capital allocations when based on expected shortfall. The reader is therefore urged 
to reflect carefully before settling on a specific risk measure and allocation principle. 
It may also be questionable to base a “coherent” risk-sensitive capital allocation on 
formal criteria only; for further details on this see Koryciorz (2004). 
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Extreme Value Theory 


Much of this chapter is based on the presentation of extreme value theory (EVT) 
in Embrechts, Kluppelberg and Mikosch (1997) (henceforth EKM) and whenever 
theoretical detail is missing the reader should consult that text. Our intention here 
is to provide more information about the statistical methods of EVT than is given 
in EKM, while briefly summarizing the theoretical ideas on which the statistical 
methods are based. 

Broadly speaking, there are two main kinds of model for extreme values. The 
most traditional models are the block maxima models described in Section 7.1: these 
are models for the largest observations collected from large samples of identically 
distributed observations. 

A more modern and powerful group of models are those for threshold exceed- 
ances, described in Section 7.2. These are models for all large observations that 
exceed some high level, and are generally considered to be the most useful for 
practical applications, due to their more efficient use of the (often limited) data on 
extreme outcomes. 

Section 7.3 is a shorter, theoretical section providing more information about 
the tails of some of the distributions and models that are prominent in this book, 
including the tails of normal variance mixture models and strictly stationary GARCH 
models. 

Sections 7.5 and 7.6 provide a concise summary of the more important ideas in 
multivariate extreme value theory; they deal, respectively, with multivariate maxima 
and multivariate threshold exceedances. The novelty of these sections is that the 
ideas are presented as far as possible using the copula methodology of Chapter 5. 
The style is similar to Sections 7.1 and 7.2, with the main results being mostly stated 
without proof and an emphasis being given to examples relevant for applications. 


7.1 Maxima 


To begin with we consider a sequence of iid rvs (X;);<n representing financial losses. 
These may have a variety of interpretations, such as operational losses, insurance 
losses and losses on a credit portfolio over fixed time intervals. Later we relax the 
assumption of independence and consider that the rvs form a strictly stationary time 
series of dependent losses; they might be (negative) returns on an investment in a 
single stock, an index, or a portfolio of investments. 
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7.1.1 Generalized Extreme Value Distribution 


Convergence of sums. The role of the generalized extreme value (GEV) distri- 
bution in the theory of extremes is analogous to that of the normal distribution 
(and more generally the stable laws) in the central limit theory for sums of rvs. 
Assuming that the underlying rvs X1, X2, ... are iid with a finite variance and writ- 
ing Sn = X1 +--+- + Xn for the sum of the first n rvs, the standard version of the 
central limit theorem (CLT) says that appropriately normalized sums (Sn — an)/bn 
converge in distribution to the standard normal distribution as n goes to infinity. The 
appropriate normalization uses sequences of normalizing constants (an) and (bn) 
defined by a, = nE(X1) and by = \/var(X1). In mathematical notation we have 


Gos 
lim a z a <x) = P(x), xeER. 


now n 


Convergence of maxima. Classical EVT is concerned with limiting distributions 
for normalized maxima M, = max (X1, ..., Xn) ofiid rvs; we refer to these as block 
maxima. The only possible non-degenerate limiting distributions for normalized 
block maxima are in the GEV family. 


Definition 7.1 (the generalized extreme value (GEV) distribution). The df of the 
(standard) GEV distribution is given by 

exp(—(1+&x)"*),  & 40, 

exp(—e “*), é=0, 


where 1 + &x > 0. A three-parameter family is obtained by defining Hg, 5 (x) := 
Hs ((x — 4)/o) for a location parameter u € R and a scale parameter o > 0. 


H; (x) = 


The parameter € is known as the shape parameter of the GEV distribution and 
Hg defines a type of distribution, meaning a family of distributions specified up to 
location and scaling (see Section A.1.1 for a formal definition). The extreme value 
distribution in Definition 7.1 is generalized in the sense that the parametric form 
subsumes three types of distribution which are known by other names according to 
the value of £: when € > 0 the distribution is a Fréchet distribution; when € = 0 
it is a Gumbel distribution; when € < 0 it is a Weibull distribution. We also note 
that for fixed x we have lime_.9 He(x) = Ho(x) (from either side) so that the 
parametrization in Definition 7.1 is continuous in €, which facilitates the use of this 
distribution in statistical modelling. 

The df and density of the GEV distribution are shown in Figure 7.1 for the three 
cases € = 0.5, = 0 and £ = —0.5, corresponding to Fréchet, Gumbel and Weibull 
types, respectively. Observe that the Weibull distribution is a short-tailed distribution 
with a so-called finite right endpoint. The right endpoint of a distribution will be 
denoted by xp = sup{x € R: F(x) < 1}. The Gumbel and Fréchet distributions 
have infinite right endpoints, but the decay of the tail of the Fréchet distribution is 
much slower than that of the Gumbel distribution. 

Suppose that block maxima M, of iid rvs converge in distribution under an appro- 
priate normalization. Recalling that P(M, < x) = F”(x), we observe that this 
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A(x) 


Figure 7.1. (a) The df of a standard GEV distribution in three cases: the solid line cor- 
responds to € = 0 (Gumbel); the dotted line is € = 0.5 (Fréchet); and the dashed line is 
é = —0.5 (Weibull). (b) Corresponding densities. In all cases u = 0 and o = 1. 


convergence means that there exist sequences of real constants (dp) and (cn), where 
Cn > 0 for all n, such that 


lim P((Mn — dn)/cn S x) = lim F” (cnx + dn) = H(x) (7.1) 
n> oo n—>0o 


for some non-degenerate df H (x). The role of the GEV distribution in the study of 
maxima is formalized by the following definition and theorem. 


Definition 7.2 (maximum domain of attraction). If (7.1) holds for some non- 
degenerate df H, then F is said to be in the maximum domain of attraction of H, 
written F € MDA(A). 


Theorem 7.3 (Fisher—-Tippett, Gnedenko). If F € MDA(A) for some non- 
degenerate df H then H must be a distribution of type Hz, i.e. a GEV distribution. 


Remarks 7.4. 


(1) If convergence of normalized maxima takes place, the type of the limiting dis- 
tribution (as specified by £) is uniquely determined, although the location and 
scaling of the limit law (u and o) depend on the exact normalizing sequences 
chosen; this is guaranteed by the so-called “convergence to types theorem” 
(EKM, p. 554). It is always possible to choose these sequences such that the 
limit appears in the standard form H¢. 


(2) By non-degenerate df we mean a limiting distribution which is not concen- 
trated on a single point. 


Examples. We calculate two examples to show how the GEV limit emerges for 
two well-known underlying distributions and appropriately chosen normalizing 
sequences. To discover how normalizing sequences may be constructed in general 
we refer to Section 3.3 of EKM. 
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Example 7.5 (exponential distribution). If the underlying distribution is an expo- 
nential distribution with df F(x) = 1 — exp(—6x) for 6 > 0 and x > O, then 
by choosing normalizing sequences c, = 1/8 and dp = Inn/B we can directly 
calculate the limiting distribution of maxima using (7.1). We get 


1 n 
F” (cnx + dn) = (1 —— ap) , x> -—lnn, 
n 
lim F” (cnx + dn) = exp(—e *), xeR, 
n->oo 
from which we conclude that F € MDA (Ho). 


Example 7.6 (Pareto distribution). If the underlying distribution is a Pareto dis- 
tribution (Pa(a@, «)) with df F(x) = 1 — (k/(k +x))* fora > 0,« > Oandx > 0, 
we can take normalizing sequences cy, = kn!/%/o and dp = «n'/“ — x. Using (7.1) 
we get 


1 —-a\n 
Font + dy) = (1-7 (1 +2) ) . pe eye 
n a a 
J n ANT x 
lim F”(cax +d,) = exp| — (1+ -— , 1+->0, 
noo a Q 
from which we conclude that F € MDA (Hia). 


Convergence of minima. The limiting theory for convergence of maxima encom- 
passes the limiting behaviour of minima using the identity 


min(X1,..., Xn) = — max(— X1, ..., —Xy). (1.2) 


It is not difficult to see that normalized minima of iid samples with df F will con- 
vergence in distribution if the df F(x) = 1 — F(—x), which is the df of the rvs 


—X,..., —Xn, is in the maximum domain of attraction of an extreme value dis- 
tribution. Writing Mý = max(—X1,..., —X,) and assuming that F € MDA (H) 
we have F 
Mž -d 
lim p(“i=* < x) = H;(x), 
n—>0oo Cn 


from which it follows easily, using (7.2), that 


(Xp lay) eed 
lim p( = l n+ t <n) = 1 Hen) 


noo 


Cn 


Thus appropriate limits for minima are distributions of type 1 — Hg(—x). For a 
symmetric distribution F we have F(x) = F(x), so that if H; is the limiting type of 
distribution for maxima for a particular value of £, then 1 — Hz (—x) is the limiting 
type of distribution for minima. 


7.1.2 Maximum Domains of Attraction 


For most applications it is sufficient to note that essentially all the common contin- 
uous distributions of statistics or actuarial science are in MDA(H¢) for some value 
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of £. In this section we consider the issue of which underlying distributions lead to 
which limits for maxima. 


The Fréchet case. The distributions that lead to the Fréchet limit Hz (x) for € > 0 
have a particularly elegant characterization involving slowly varying or regularly 
varying functions. 


Definition 7.7 (slowly varying and regularly varying functions). 


(i) A positive, Lebesgue-measurable function L on (0, 00) is slowly varying at 00 


if 
L(tx) = 


x00 L(x) z 


l, t>0. 


(ii) A positive, Lebesgue-measurable function A on (0, oo) is regularly varying 
at co with index p € Rif 
h(tx) 
im = 
x>œ h(x) 


t, t>0. 


Slowly varying functions are functions which, in comparison with power functions, 
change relatively slowly for large x, an example being the logarithm L(x) = In(x). 
Regularly varying functions are functions which can be represented by power func- 
tions multiplied by slowly varying functions, i.e. h(x) = x? L(x). 


Theorem 7.8 (Fréchet MDA, Gnedenko). Foré > 0, 
F €MDA(Ag) & > F(x) =x7'F L(x) (7.3) 
for some function L slowly varying at œ. 


This means that distributions giving rise to the Fréchet case are distributions with 
tails that are regularly varying functions with a negative index of variation. Their 
tails decay essentially like a power function and the rate of decay œ = 1/€ is often 
referred to as the tail index of the distribution. 

These distributions are the most studied distributions in EVT and they are of par- 
ticular interest in financial applications because they are heavy-tailed distributions 
with infinite higher moments. If X is a non-negative rv whose df F is an element 
of MDA(H;z) for £ > 0, then it may be shown that E(X*) = œ fork > 1/é 
(EKM, p. 568). If, for some small ¢ > 0, the distribution is in MDA (Hg /2)+e), it is 
an infinite-variance distribution, and if the distribution is in MDA (H4 /4)+e), it is a 
distribution with infinite fourth moment. 


Example 7.9 (Pareto distribution). In Example 7.6 we verified by direct calcula- 
tion that normalized maxima of iid Pareto variates converge to a Fréchet distribution. 
Observe that the tail of the Pareto df in (A.13) may be written F (x) = x-*L(x), 
where it may be easily checked that L(x) = (k~! + x~!)~® is a slowly varying 
function; indeed, as x — ov, L(x) converges to the constant x“. Thus we verify 
that the Pareto df has the form (7.3). 
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Further examples of distributions giving rise to the Fréchet limit for maxima 
include the Fréchet distribution itself, inverse gamma, Student t, loggamma, F 
and Burr distributions. We will provide further demonstrations for some of these 
distributions in Section 7.3.1. 


The Gumbel case. The characterization of distributions in this class is more com- 
plicated than in the Fréchet class. We have seen in Example 7.5 that the exponential 
distribution is in the Gumbel class and, more generally, it could be said that the 
distributions in this class have tails that have an essentially exponential decay. A 
positive-valued rv with a df in MDA (Ho) has finite moments of any positive order, 
Le. E(X*) < œ for every k > 0 (EKM, p. 148). 

However, there is a great deal of variety in the tails of distributions in this class, so, 
for example, both the normal and the lognormal distributions belong to the Gumbel 
class (EKM, pp. 145-147). The normal distribution, as discussed in Section 3.1.4, is 
thin tailed, but the lognormal distribution has much heavier tails and we would need 
to collect a lot of data from the lognormal distribution before we could distinguish 
its tail behaviour from that of a distribution in the Fréchet class. 

In financial modelling it is often erroneously assumed that the only interesting 
models for financial returns are the power-tailed distributions of the Fréchet class. 
The Gumbel class is also interesting because it contains many distributions with 
much heavier tails than the normal, even if these are not regularly varying power 
tails. Examples are hyperbolic and generalized hyperbolic distributions (with the 
exception of the special boundary case that is Student fr). 

Other distributions in MDA(Hp) include the gamma, chi-squared, standard 
Weibull (to be distinguished from the Weibull special case of the GEV distribu- 
tion) and Benktander type I and II distributions (which are popular actuarial loss 
distributions) and the Gumbel itself. We provide demonstrations for some of these 
examples in Section 7.3.2. 


The Weibull case. This is perhaps the least important case for financial modelling, 
at least in the area of market risk, since the distributions in this class all have finite 
right endpoints. Although all potential financial and insurance losses are, in practice, 
bounded, we will still tend to favour models that have infinite support for loss 
modelling. An exception may be in the area of credit risk modelling, where we will 
see in Chapter 8 that probability distributions on the unit interval [0, 1] are very 
useful. A characterization of the Weibull class is as follows. 


Theorem 7.10 (Weibull MDA, Gnedenko). For < 0, 
F € MDA(Aije) > xr < œ and F(xp — x7!) = x" L(x) 
for some function L slowly varying at oo. 


It can be shown (EKM, p. 137) that a beta distribution with density fog as given 
in (A.4) isin MDA(H_/g). This includes the special case of the uniform distribution 
for B =a = 1. 
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7.1.3 Maxima of Strictly Stationary Time Series 


The standard theory of the previous sections concerns maxima of iid sequences. 
With financial time series in mind, we now look briefly at the theory for maxima 
of strictly stationary time series and find that the same types of limiting distribution 
apply. 

In this section let (X;);-z denote a strictly stationary time series with sta- 
tionary distribution F and let (X;);en denote the associated iid process, i.e. a 
strict white noise process with the same df F. Let M, = max(X),..., Xn) and 
Mn = max(X tee tes en denote block maxima of the original series and the iid 
series, respectively. 

For many processes (X;);en, it may be shown that there exists a real number 6 
in (0, 1] such that 


lim P{(Mn — dn)/cn < x} = H) (7.4) 
n> 
for a non-degenerate limit H (x) if and only if 
lim P((Mn — dn)/¢n < x} = H?’ (x). (7.5) 
n> 


For such processes this value 6 is known as the extremal index of the process (not to be 
confused with the tail index of distributions in the Fréchet class). A formal definition 
is more technical (see Notes and Comments) but the basic ideas behind (7.4) and (7.5) 
are easily explained. 

For processes with an extremal index, normalized block maxima converge in 
distribution provided that maxima of the associated iid process converge in distri- 
bution: that is, provided the underlying distribution F is in MDA (H¢) for some £. 
Moreover, since HË (x) can be easily verified to be a distribution of the same type as 
Hz (x), the limiting distribution of the normalized block maxima of the dependent 
series is a GEV distribution with exactly the same & parameter as the limit for the 
associated iid data; only the location and scaling of the distribution may change. 

Writing u = c,x + dn we observe that, for large enough n, (7.4) and (7.5) imply 
that 

P (Mn < u) © P? (Mn < u) = F (u), (7.6) 
so that for u large the probability distribution of the maximum of n observations 
from the time series with extremal index 6 can be approximated by the distribution 
of the maximum of n@ < n observations from the associated iid series. In a sense, 
n0 can be thought of as counting the number of roughly independent clusters of 
observations in n observations, and 0 is often interpreted as the reciprocal of the 
mean cluster size. 

Not every strictly stationary process has an extremal index (see EKM, p. 418, 
for a counterexample) but, for the kinds of time series processes that interest us in 
financial modelling, an extremal index generally exists. Essentially, we only have 
to distinguish between the cases when 6 = | and the cases when 6 < 1: for the 
former there is no tendency to cluster at high levels and large sample maxima from 
the time series behave exactly like maxima from similarly sized iid samples; for the 
latter we must be aware of a tendency for extreme values to cluster. 
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Table 7.1. Approximate values of the extremal index as a function of 
the parameter a for the ARCH(1) process in (4.24). 


a, 01 0.3 0.5 0.7 0.9 
© 0.999 0.939 0.835 0.721 0.612 


e Strict white noise processes (iid rvs) have extremal index 0 = 1. 

e ARMA processes with Gaussian strict white noise innovations have 0 = 1 
(EKM, pp. 216-218). However, if the innovation distribution is in MDA (H¢) 
for € > 0, then 0 < 1 (EKM, pp. 415, 416). 

e ARCH and GARCH processes have 0 < 1 (EKM, pp. 476-480). 


The final fact is particularly relevant to our financial applications, since we saw 
in Chapter 4 that ARCH and GARCH processes provide good models for many 
financial return series. 


Example 7.11 (the extremal index of the ARCH(1) process). In Table 7.1 we 
reproduce some results from de Haan et al. (1989), who calculate approximate 
values for the extremal index of the ARCH(1) process (see Definition 4.16) using a 
Monte Carlo simulation approach. Clearly, the stronger the ARCH effect (that is, the 
magnitude of the parameter a1), the greater the tendency of the process to cluster. 
For a process with parameter 0.9 the extremal index value 6 = 0.612 is interpreted 
as suggesting that the average cluster size is 1/0 = 1.64. 


7.1.4 The Block Maxima Method 


Fitting the GEV distribution. Suppose we have data from an unknown underlying 
distribution F, which we suppose lies in the domain of attraction of an extreme value 
distribution Hz for some &. If the data are realizations of iid variables, or variables 
from a process with an extremal index such as GARCH, the implication of the theory 
is that the true distribution of the n-block maximum M,, can be approximated for 
large enough n by a three-parameter GEV distribution He... 

We make use of this idea by fitting the GEV distribution Hz „,o to data on the n- 
block maximum. Obviously we need repeated observations of an n-block maximum 
and we assume that the data can be divided into m blocks of size n. This makes most 
sense when there are natural ways of blocking the data. The method has its origins in 
hydrology, where, for example, daily measurements of water levels might be divided 
into yearly blocks and the yearly maxima collected. Analogously, we will consider 
financial applications where daily return data (recorded on trading days) are divided 
into yearly (or semesterly or quarterly) blocks and the maximum daily falls within 
these blocks are analysed. 

We denote the block maximum of the jth block by Mnj, so our data are 
Mn, ---; Mnm. The GEV distribution can be fitted using various methods, including 
maximum likelihood. An alternative is the method of probability-weighted moments 
(see Notes and Comments). In implementing maximum likelihood it will be assumed 
that the block size n is quite large so that, regardless of whether the underlying data 
are dependent or not, the block maxima observations can be taken to be independent. 
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In this case, writing h¢,,,,¢ for the density of the GEV distribution, the log-likelihood 
is easily calculated to be 


I(E, u, 0; Mni,..., Mam) 


m 
So In he yo (Mni) 


i=1 


1\~ Mnii-B\ < Mni — uy 18 
mine (147) Sam (rett) -È (ree) 


i=l 


which must be maximized subject to the parameter constraints that o > 0 and 
1+ &(Mni — u)/o > O for all i. While this represents an irregular likelihood 
problem, due to the dependence of the parameter space on the values of the data, 
the consistency and asymptotic efficiency of the resulting MLEs can be established 
for the case when £ > —5 using results in Smith (1985). 

In determining the number and size of the blocks (m and n, respectively), a trade- 
off necessarily takes place: roughly speaking, a large value of n leads to a more 
accurate approximation of the block maxima distribution by a GEV distribution and 
a low bias in the parameter estimates; a large value of m gives more block maxima 
data for the ML estimation and leads to a low variance in the parameter estimates. 
Note also that, in the case of dependent data, somewhat larger block sizes than 
are used in the iid case may be advisable; dependence generally has the effect that 
convergence to the GEV distribution is slower, since the effective sample size is n0, 
which is smaller than n. 


Example 7.12 (block maxima analysis of S&P return data). Suppose we turn 
the clock back and imagine it is the early evening of Friday 16 October 1987. An 
unusually turbulent week in the equity markets has seen the S&P 500 index fall 
by 9.21%. On that Friday alone the index is down 5.25% on the previous day, the 
largest one-day fall since 1962. 

We fit the GEV distribution to annual maximum daily percentage falls in value 
for the S&P index. Using data going back to 1960, shown in Figure 7.2, gives us 
28 observations of the annual maximum fall (including the latest observation from 
the incomplete year 1987). The estimated parameter values are Ê = 0.27, Ô = 2.04 
and ô = 0.72 with standard errors 0.21, 0.16 and 0.14, respectively. Thus the fitted 
distribution is a heavy-tailed Fréchet distribution with an infinite fourth moment, 
suggesting that the underlying distribution is heavy-tailed. Note that the standard 
errors imply considerable uncertainty in our analysis, as might be expected with only 
28 observations of maxima. In fact, in a likelihood ratio test of the null hypothesis 
that a Gumbel model fits the data (Hp : € = 0), the null hypothesis cannot be 
rejected. 

To increase the number of blocks we also fit a GEV model to 56 semesterly 
maxima and obtain the parameter estimates E = 0.36, Ô = 1.65 and ô = 0.54 with 
standard errors 0.15, 0.09 and 0.08. This model has an even heavier tail, and the null 
hypothesis that a Gumbel model is adequate is now rejected. 
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Figure 7.2. (a) S&P percentage returns for the period 1960 to 16 October 1987. (b) Annual 
maxima of daily falls in the index; superimposed is an estimate of the 10-year return level 
with associated 95% confidence interval (dotted lines). (c) Semesterly maxima of daily falls 
in the index; superimposed is an estimate of the 20-semester return level with associated 95% 
confidence interval. See Examples 7.12 and 7.15 for full details. 


Return levels and stress losses. The fitted GEV model can be used to analyse stress 
losses and we focus here on two possibilities: in the first approach we define the 
frequency of occurrence of the stress event and estimate its magnitude, this being 
known as the return-level estimation problem; in the second approach we define the 
size of the stress event and estimate the frequency of its occurrence, this being the 
return-period problem. 


Definition 7.13 (return level). Let H denote the df of the true distribution of 
the n-block maximum. The k n-block return level is rnk = q1—-1/k(Ħ), i.e. the 
(1 — 1/k)-quantile of H. 


The k n-block return level can be roughly interpreted as that level which is 
exceeded in one out of every k n-blocks on average. For example, the 10-trading- 
year return level r260,10 is that level which is exceeded in one out of every 10 years on 
average. (In the notation we assume that every year has 260 trading days, although 
this is only an average and there will be slight differences from year to year.) Using 
our fitted model we would estimate a return level by 


ee Gl eee ee ee | eee 7.7 
mangal) 0 


Definition 7.14 (return period). Let H denote the df of the true distribution of 
the n-block maximum. The return period of the event {M@, > u} is given by 
kn,u = 1/H (u). 


Observe that the return period kn,u is defined in such a way that the kn,„ n-block 
return level is u. In other words, in kn,u n-blocks we would expect to observe a 
single block in which the level u was exceeded. If there was a strong tendency for 
the extreme values to cluster, we might expect to see multiple exceedances of the 
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level within that block. Assuming that H is the df of a GEV distribution and using 
our fitted model, we would estimate the return period by ki: gay H; TE u). 

Note that both f, , and kn. u are simple functionals of the sunita; parameters 
of the GEV distribution. As well as calculating point estimates for these quantities 
we should give confidence intervals that reflect the error in the parameter estimates 
of the GEV distribution. A good method is to base such confidence intervals on the 
likelihood ratio statistic, as described in Section A.3.5. To do this we reparametrize 
the GEV distribution in terms of the quantity of interest. For example, in the case 
of return level, let @ = Heh, (1 — (1/k)) and parametrize the GEV distribution by 
0 = (¢,&, 0) rather than 6 = (£, u, 0)’. The maximum likelihood estimate of 
is the estimate (7.7) and a confidence interval can be constructed according to the 
method in Section A.3.5 (see (A.22) in particular). 


Example 7.15 (stress losses for S&P return data). We continue Example 7.12 by 
estimating the 10-year return level and the 20-semester return level based on data up 
to 16 October 1987, using (7.7) for the point estimate and the likelihood ratio method 
as described above to get confidence intervals. The point estimator of the 10-year 
return level is 4.3% with a 95% confidence interval of (3.4, 7.1); the point estimator 
of the 20-semester return level is 4.5% with a 95% confidence interval of (3.5, 7.4). 
Clearly, there is some uncertainty about the size of events of this frequency even 
with 28 years or 56 semesters of data. 

The day after the end of our dataset, 19 October 1987, was Black Monday. The 
index fell by the unprecedented amount of 20.5% in one day. This event is well 
outside our confidence interval for a 10-year loss. If we were to estimate a 50-year 
return level (an event beyond our experience if we have 28 years of data), then our 
point estimate would be 7.0 with a confidence interval of (4.7, 22.2), so the 1987 
crash lies close to the upper boundary of our confidence interval for a much rarer 
event. But the 28 maxima are really too few to get a reliable estimate for an event 
as rare as the 50-year event. 

If we turn the problem around and attempt to estimate the return period of a 
20.5% loss, the point estimate is 2100 years (i.e. a 2 millennium event) but the 95% 
confidence interval encompasses everything from 45 years to essentially never! The 
analysis of semesterly maxima gives only moderately more informative results: the 
point estimate is 1400 semesters; the confidence interval runs from 100 semesters 
to 1.6 x 10° semesters. In summary, on 16 October 1987 we simply did not have 
the data to say anything meaningful about an event of this magnitude. This illus- 
trates the inherent difficulties of attempting to quantify events beyond our empirical 
experience. 


Notes and Comments 


The main source for this chapter is Embrechts, Kluppelberg and Mikosch (1997) 
(EKM). Further important texts on EVT include Gumbel (1958), Leadbetter, Lind- 
gren and Rootzén (1983), Galambos (1987), Resnick (1987), Falk, Husler and Reiss 
(1994), Reiss and Thomas (1997), Coles (2001) and Beirlant et al. (2004). 
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The forms of the limit law for maxima were first studied by Fisher and Tippett 
(1928). The subject was brought to full mathematical fruition in the fundamental 
papers of Gnedenko (1941, 1943). The concept of the extremal index, which appears 
in the theory of maxima of stationary series, has a long history. The first mathemat- 
ically precise definition seems to have been given by Leadbetter (1983). See also 
Leadbetter, Lindgren and Rootzén (1983) and Smith and Weissman (1994) for more 
details. The theory required to calculate the extremal index of an ARCH(1) process 
(as in Table 7.1) is found in de Haan et al. (1989) and also in EKM, pp. 473-480. 
For the GARCH(1, 1) process consult Mikosch and Starica (2000). 

A further difficult task is the statistical estimation of the extremal index from time 
series data under the assumption that these data do indeed come from a process with 
an extremal index. Two general methods known as the blocks and runs methods are 
described in EKM, Section 8.1.3; these methods go back to work of Hsing (1991) 
and Smith and Weissman (1994). Although the estimators have been used in real- 
world data analyses (see, for example, Davison and Smith 1990)), it remains true 
that the extremal index is a very difficult parameter to estimate accurately. 

The maximum likelihood fitting of the GEV distribution is described by Hosking 
(1985) and Hosking, Wallis and Wood (1985). Consistency and asymptotic nor- 
mality can be demonstrated for the case E > —0.5 using results in Smith (1985). 
An alternative method known as probability-weighted moments (PWM) has been 
proposed by Hosking, Wallis and Wood (1985) (see also EKM, pp. 321-323). The 
analysis of block maxima in Examples 7.12 and 7.15 is based on McNeil (1998). 
Analyses of financial data using the block maxima method may also be found in 
Longin (1996), one of the earliest papers to apply EVT methodology to financial 
data. 


7.2 Threshold Exceedances 


The block maxima method discussed in Section 7.1.4 has the major defect that it is 
very wasteful of data; to perform our analyses we retain only the maximum losses in 
large blocks. For this reason it has been largely superseded in practice by methods 
based on threshold exceedances, where we use all data that are extreme in the sense 
that they exceed a particular designated high level. 


7.2.1 Generalized Pareto Distribution 


The main distributional model for exceedances over thresholds is the generalized 
Pareto distribution (GPD). 


Definition 7.16 (GPD). The df of the GPD is given by 


1—(1+8x/py/*, € 40, 

Ge p(x) = / 7 (7.8) 
1 — exp(—x/B), &=0, 

where 6 > 0, and x > 0 when é > 0 and0 < x < —ß/E when é < 0. The 

parameters é and £ are referred to, respectively, as the shape and scale parameters. 
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Figure 7.3. (a) Distribution function of GPD in three cases: the solid line corresponds to 
é = 0 (exponential); the dotted line to £ = 0.5 (a Pareto distribution); and the dashed line to 
é = —0.5 (Pareto type II). The scale parameter £ is equal to 1 in all cases. (b) Corresponding 
densities. 


Like the GEV distribution in Definition 7.1, the GPD is generalized in the sense 
that it contains a number of special cases: when E > 0 the df Gz g is that of an 
ordinary Pareto distribution with a = 1/Ẹ and k = £/é (see Section A.2.8); when 
é = 0 we have an exponential distribution; when € < 0 we have a short-tailed, 
Pareto type II distribution. Moreover, as in the case of the GEV distribution, for 
fixed x the parametric form is continuous in &, so lime_,9 Gg, g(x) = Go,g (x). The 
df and density of the GPD for various values of £ and 6 = 1 are shown in Figure 7.3. 

In terms of domains of attraction we have that G g € MDA (He) for all € € R. 
Note that, for € > 0 and £ < 0, this assertion follows easily from the characteriza- 
tions in Theorems 7.8 and 7.10. In the heavy-tailed case, € > 0, it may be easily 
verified that E(X*) = oo for k > 1/&. The mean of the GPD is defined provided 
E < landis 


E(X) = 8/0 — &). (7.9) 


The role of the GPD in EVT is as a natural model for the excess distribution over 
a high threshold. We define this concept along with the mean excess function, which 
will also play an important role in the theory. 
Definition 7.17 (excess distribution over threshold u). Let X be an rv with df F. 
The excess distribution over the threshold u has df 
F — F 
pO Rne eee ea (1.10) 
1 — F(u) 
for O < x < xp — u, where xp < œ is the right endpoint of F. 
Definition 7.18 (mean excess function). The mean excess function of an rv X with 
finite mean is given by 


elu) = E(X — u | X > u). (7.11) 
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The excess df F, describes the distribution of the excess loss over the threshold 
u, given that u is exceeded. The mean excess function e(u) expresses the mean of 
F,, as a function of u. In survival analysis the excess df is more commonly known as 
the residual life df—it expresses the probability that, say, an electrical component 
which has functioned for u units of time fails in the time period (u, u + x]. The 
mean excess function is known as the mean residual life function and gives the 
expected residual lifetime for components with different ages. For the special case 
of the GPD, the excess df and mean excess function are easily calculated. 


Example 7.19 (excess distribution of exponential and GPD). If F is the df of 
an exponential rv, then it is easily verified that F,,(x) = F(x) for all x, which is 
the famous lack-of-memory property of the exponential distribution—the residual 
lifetime of the aforementioned electrical component would be independent of the 
amount of time that component has already survived. More generally, if X has df 
F = Ge g, then, using (7.10), the excess df is easily calculated to be 


Fu) = Ge pw), pU) = P + éu, (7.12) 


where 0 < x < œ if > Oand0 < x < —(6/é) —uif € < 0. The excess 
distribution remains a GPD with the same shape parameter & but with a scaling that 
grows linearly with the threshold u. The mean excess function of the GPD is easily 
calculated from (7.12) and (7.9) to be 


pu) p+gu 
a. 
where 0 <u < œ if0 < < l and0 <u < —6/é if € < 0. It may be observed 


that the mean excess function is linear in the threshold u, which is a characterizing 
property of the GPD. 


e(u) = (7.13) 


Example 7.19 shows that the GPD has a kind of stability property under the 
operation of calculating excess distributions. We now give a mathematical result 
that shows that the GPD is, in fact, a natural limiting excess distribution for many 
underlying loss distributions. The result can also be viewed as a characterization 
theorem for the domain of attraction of the GEV distribution. In Section 7.1.2 we 
looked separately at characterizations for each of the three cases € > 0, € = 0 and 
& < 0; the following result offers a global characterization of MDA(H¢) for all & 
in terms of the limiting behaviour of excess distributions over thresholds. 


Theorem 7.20 (Pickands—Balkema-—de Haan). We can find a (positive-measurable 
function) B(u) such that 
lim sup) | Fu (x) — Ge pw (*)| = 9, 


U>XF O<y<xp—u 
if and only if F € MDA(H;z),& € R. 


Thus the distributions for which normalized maxima converge to a GEV distri- 
bution constitute a set of distributions for which the excess distribution converges 
to the GPD as the threshold is raised; moreover, the shape parameter of the limiting 
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GPD for the excesses is the same as the shape parameter of the limiting GEV dis- 
tribution for the maxima. We have already stated in Section 7.1.2 that essentially all 
the commonly used continuous distributions of statistics are in MDA (H¢) for some 
£, so Theorem 7.20 proves to be a very widely applicable result that essentially says 
that the GPD is the canonical distribution for modelling excess losses over high 
thresholds. 


7.2.2 Modelling Excess Losses 


We exploit Theorem 7.20 by assuming that we are dealing with a loss distribu- 
tion F €e MDA(H:) so that, for some suitably chosen high threshold u, we can 
model F, by a generalized Pareto distribution. We formalize this with the following 
assumption. 


Assumption 7.21. Let F be a loss distribution with right endpoint xp and assume 
that for some high threshold u we have F(x) = Gg, g(x) forO < x < xp — u and 
some é € Rand f > 0. 


This is clearly an idealization, since in practice the excess distribution will gen- 
erally not be exactly GPD, but we use Assumption 7.21 to make a number of calcu- 
lations in the following sections. 


The method. Given loss data X1,..., Xn from F, arandom number N, will exceed 
our threshold u; it will be convenient to relabel these data X Taies X N,- For each 
of these exceedances we calculate the amount Y; = X j — u of the excess loss. We 
wish to estimate the parameters of a GPD model by fitting this distribution to the 
N, excess losses. There are various ways of fitting the GPD including maximum 
likelihood (ML) and probability-weighted moments (PWM). The former method is 
more commonly used and is easy to implement if the excess data can be assumed 
to be realizations of independent rvs, since the joint density will then be a product 
of marginal GPD densities. 

Writing gg, for the density of the GPD, the log-likelihood may be easily calcu- 
lated to be 

N, 


X In ge gj) 


j=! 


(yee Y, 
= —N, Ing (1+ )Sm(i+e), (7.14) 
Der b 


which must be maximized subject to the parameter constraints that B > 0 and 
1 +&Y;/£ > 0 for all j. Solving the maximization problem yields a GPD model 
G zp for the excess distribution F,,. 


nL, B; Yi, ..., Yn,) 


Non-iid data. For insurance or operational risk data the iid assumption is often 
unproblematic, but this is clearly not true for time series of financial returns. If the 
data are serially dependent but show no tendency to give clusters of extreme values, 
then this might suggest that the underlying process has extremal index 0 = 1. In this 
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case, asymptotic theory that we summarize in Section 7.4 suggests a limiting model 
for high-level threshold exceedances, in which exceedances occur according to a 
Poisson process and the excess loss amounts are iid generalized Pareto distributed. 
If extremal clustering is present, suggesting an extremal index 0 < 1 (as would 
be consistent with an underlying GARCH process), the assumption of independent 
excess losses is less satisfactory. The easiest approach is to neglect this problem 
and to consider the ML method to be a quasi-maximum likelihood (QML) method, 
where the likelihood is misspecified with respect to the serial dependence structure 
of the data; we follow this course in this section. The point estimates should still be 
reasonable, although standard errors may be too small. In Section 7.4 we discuss 
threshold exceedances in non-iid data in more detail. 


Excesses over higher thresholds. From the model we have fitted to the excess 
distribution over u we can easily infer a model for the excess distribution over any 
higher threshold. We have the following lemma. 


Lemma 7.22. Under Assumption 7.21 it follows that Fy(x) = Ge,g+e(—u)(x) for 
any higher threshold v > u. 


Proof. We use (7.10) and the df of the GPD in (7.8) to infer that 
Futx)  Fuu+(+v—u)) Fw) 


F,(x) = — = - 
F(v) F(u) F(u + (v — u)) 
n F,(x +v—u) 2 Ge p(x +v- u) 
E F,(v— u) p Gs p(v — u) 


= Gg p+w—-u) (x). 


Thus the excess distribution over higher thresholds remains a GPD with the same 
€ parameter but a scaling that grows linearly with the threshold v. Provided that 
& < 1, the mean excess function is given by 


B+é&u—u) gv Tan 

Ce ee be 
where u < v<wif0<&é <landu<v<u—6/éEifée <0. 
The linearity of the mean excess function (7.15) in v is commonly used as a 
diagnostic for data admitting a GPD model for the excess distribution. It forms 


the basis for the following simple graphical method for choosing an appropriate 
threshold. 


e(v) = (7.15) 


Sample mean excess plot. For positive-valued loss data X1,..., Xn we define 
the sample mean excess function to be an empirical estimator of the mean excess 
function in Definition 7.18. The estimator is given by 
n 
ia (Xi — vj Xj>v 
env) = = = ceed 


i=1 [{X;>v} 


(7.16) 
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To study this function we generally construct the mean excess plot {(Xi.n, €n(Xi,n)) : 
2 <i <n}, where Xi n denotes the ith order statistic. If the data support a GPD 
model over a high threshold, then (7.15) suggests that this plot should become 
increasingly “linear” for higher values of v. A linear upward trend indicates a GPD 
model with positive shape parameter &; a plot tending towards the horizontal indi- 
cates a GPD with approximately zero shape parameter, or, in other words, an expo- 
nential excess distribution; a linear downward trend indicates a GPD with negative 
shape parameter. 

These are the ideal situations but in practice some experience is required to read 
mean excess plots. Even for data that are genuinely generalized Pareto distributed, 
the sample mean excess plot is seldom perfectly linear, particularly towards the 
right-hand end, where we are averaging a small number of large excesses. In fact 
we often omit the final few points from consideration, as they can severely distort 
the picture. If we do see visual evidence that the mean excess plot becomes linear, 
then we might select as our threshold u a value towards the beginning of the linear 
section of the plot (see, in particular, Example 7.24). 


Example 7.23 (Danish fire loss data). The Danish fire insurance data are a well- 
studied set of financial losses that neatly illustrate the basic ideas behind modelling 
observations that seem consistent with an iid model. The dataset consists of 2156 
fire insurance losses over 1 000 000 Danish kroner from 1980 to 1990 inclusive. The 
loss figure represents a combined loss for a building and its contents, as well as in 
some cases a loss of business earnings; the losses are inflation adjusted to reflect 
1985 values and are shown in Figure 7.4(a). 

The mean excess plot in Figure 7.4(b) is in fact fairly “linear” over the entire range 
of the losses and its upward slope leads us to expect that a GPD with positive shape 
parameter & could be fitted to the entire dataset. However, there is some evidence 
of a “kink” in the plot below the value 10 and a “straightening out” of the plot 
above this value, so we have chosen to set our threshold at u = 10 and fit a GPD 
to excess losses above this threshold, in the hope of obtaining a model that is a 
good fit to the largest of the losses. The ML parameter estimates are é = 0.50 and 
ê = 7.0 with standard errors 0.14 and 1.1, respectively. Thus the model we have 
fitted is essentially a very heavy-tailed, infinite-variance model. A picture of the 
fitted GPD model for the excess distribution Ê u(x — u) is also given in Figure 7.4(c), 
superimposed on points plotted at empirical estimates of the excess probabilities for 
each loss; note the good correspondence between the empirical estimates and the 
GPD curve. 

In insurance we might use the model to estimate the expected size of the insur- 
ance loss, given that it enters a given insurance layer. Thus we can estimate 
the expected loss size given exceedance of the threshold of 10000000 kroner 
or of any other higher threshold by using (7.15) with the appropriate parameter 
estimates. 


Example 7.24 (AT&T weekly loss data). Suppose we have an investment in AT&T 
stock and want to model weekly losses in value using an unconditional approach. If 
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Figure 7.4. (a) Time series plot of the Danish data. (b) Sample mean excess plot. 


(c) Empirical distribution of excesses and fitted GPD. See Example 7.23 for full details. 


X, denotes the weekly log-return, then the percentage loss in value of our position 
over a week is given by L; = 100(1 — exp(X;)) and data on this loss for the 521 
complete weeks in the period 1991-2000 are shown in Figure 7.5(a). 

A sample mean excess plot of the positive loss values is shown in Figure 7.5(b) 
and this suggests that a threshold can be found above which a GPD approximation to 
the excess distribution should be possible. We have chosen to position the threshold 
at a loss value of 2.75%, which is marked by a vertical line on the plot and gives 
102 exceedances. 

We observed in Section 4.1 that monthly AT&T return data over the period 1993- 
2000 do not appear consistent with a strict white noise hypothesis, so the issue of 
whether excess losses can be modelled as independent is relevant. This issue is taken 
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Figure 7.5. (a) Time series plot of AT&T weekly percentage loss data. (b) Sample mean 
excess plot. (c) Empirical distribution of excesses and fitted GPD. See Example 7.24 for full 
details. 


up in Section 7.4 but for the time being we ignore it and implement a standard ML 
approach to estimating the parameters of a GPD model for the excess distribution; 
we obtain the estimates E = 0.22 and B = 2.1 with standard errors 0.13 and 0.34, 
respectively. Thus the model we have fitted is a model that is close to having an 
infinite fourth moment. A picture of the fitted GPD model for the excess distribution 
Ê, (x — u) is also given in Figure 7.5(c), superimposed on points plotted at empirical 
estimates of the excess probabilities for each loss. 


7.2.3 Modelling Tails and Measures of Tail Risk 


In this section we describe how the GPD model for the excess losses is used to 
estimate the tail of the underlying loss distribution F and associated risk measures. 
To make the necessary theoretical calculations we again make Assumption 7.21. 
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Tail probabilities and risk measures. We observe firstly that under Assumption 7.21 
we have, for x > u, 
F(x) = P(X > u)P(X >x|X>u) 
= F(u)P(X —u>x—u|X>u) 
= F(u)F,(x — u) 
= xu)! 
= F(u) 1+ 3B : (7.17) 


which, if we know F(u), gives us a formula for tail probabilities. This formula 
may be inverted to obtain a high quantile of the underlying distribution, which we 
interpret as a VaR. For a > F(u) we have that VaR is equal to 


VaR, = wyaus8((E*)°-1) (7.18) 
a = da = E Fw ; f 


Assuming that € < | the associated expected shortfall can be calculated easily 
from (2.23) and (7.18). We obtain 


1 i VaRy  B—éu 

ES. = 5, | CG ae mr f=." (7.19) 
Note that Assumption 7.21 and Lemma 7.22 imply that excess losses above VaRq 
have a GPD distribution satisfying Fvar, = Ge,p+£(VaRy —u)- The expected shortfall 
estimator in (7.19) can also be obtained by adding the mean of this distribution to 
VaRg, i.e. ES = VaRg +e(VaRg), where e(VaR,q) is given in (7.15). Itis interesting 
to look at how the ratio of the two risk measures behaves for large values of the 
quantile probability œ. It is easily calculated from (7.18) and (7.19) that 


_ ESq oe £ >20, 
lim —— = 


(7.20) 
a>1 VaRg 1, E€ <0, 

so the shape parameter £ of the GPD effectively determines the ratio when we go 

far enough out into the tail. 


Estimation in practice. We note that, under Assumption 7.21, tail probabilities, 
VaRs and expected shortfalls are all given by formulas of the form g(é, B, F (u)). 
Assuming that we have fitted a GPD to excess losses over a threshold u, as described 
in Section 7.2.2, we estimate these quantities by first replacing € and £ in formu- 
las (7.17)-(7.19) by their estimates. Of course, we also require an estimate of F (u) 
and here we take the simple empirical estimator N, /n. In doing this, we are implicitly 
assuming that there is a sufficient proportion of sample values above the threshold 
u to estimate F (u) reliably. However, we hope to gain over the empirical method by 
using a kind of extrapolation based on the GPD for more extreme tail probabilities 
and risk measures. For tail probabilities we obtain an estimator, first proposed by 
Smith (1987), of the form 


z —„\-1Ê 
fw = (1165) , (7.21) 
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which we stress is only valid for x > u. Fora > 1 — N,/n we obtain analogous 
point estimators of VaR, and ES, from (7.18) and (7.19). 

Of course we would also like to obtain confidence intervals. If we have taken the 
likelihood approach to estimating € and £, then it is quite easy to give confidence 
intervals for gÊ À Ê, N,,/n) that take into account the uncertainty in Ê and B , but 
neglect the uncertainty in N,,/n as an estimator of F(u). We use the approach 
described at the end of Section 7.1.4 for return levels, whereby the GPD model is 
reparametrized in terms of @ = g(&, 6, N,/n) and a confidence interval for ĝ is 
constructed based on the likelihood ratio test as in Section A.3.5. 


Example 7.25 (risk measures for AT&T loss data). Suppose we have fitted a GPD 
model to excess weekly losses above the threshold u = 2.75% as in Example 7.24. 
We use this model to obtain estimates of the 99% VaR and expected shortfall of the 
underlying weekly loss distribution. The essence of the method is displayed in Fig- 
ure 7.6; this is a plot of estimated tail probabilities on logarithmic axes, with various 
dotted lines superimposed to indicate the estimation of risk measures and associated 
confidence intervals. The points on the graph are the 102 threshold exceedances and 
are plotted at y-values corresponding to the tail of the empirical distribution function; 
the smooth curve running through the points is the tail estimator (7.21). 

Estimation of the 99% quantile amounts to determining the point of intersection of 
the tail estimation curve and the horizontal line F (x) = 0.01 (not marked on graph); 
the first vertical dotted line shows the quantile estimate. The horizontal dotted line 
aids in the visualization of a 95% confidence interval for the VaR estimate; the 
degree of confidence is shown on the alternative y-axis to the right of the plot. 
The boundaries of a 95% confidence interval are obtained by determining the two 
points of intersection of this horizontal line with the dotted curve, which is a profile 
likelihood curve for the VaR as a parameter of the GPD model and is constructed 
using likelihood ratio test arguments as in Section A.3.5. Dropping the horizontal 
line to the 99% mark would correspond to constructing a 99% confidence interval 
for the estimate of the 99% VaR. The point estimate and 95% confidence interval 
for the 99% quantile are estimated to be 11.7% and (9.6, 16.1). 

The second vertical line on the plot shows the point estimate of the 99% expected 
shortfall. A 95% confidence interval is determined from the dotted horizontal line 
and its points of intersection with the second dotted curve. The point estimate and 
95% confidence interval are 17.0% and (12.7, 33.6). Note that if we divide the point 
estimates of the shortfall and the VaR we get 17/11.7 ~ 1.45, which is larger than 
the asymptotic ratio (1 — Ê )~! = 1.29 suggested by (7.20); this is generally the case 
at finite levels and is explained by the second term in (7.19) being a non-negligible 
positive quantity. 


Before leaving the topic of GPD tail modelling it is clearly important to see how 
sensitive our risk-measure estimates are to the choice of the threshold. Hitherto we 
have considered single choices of threshold u and looked at a series of incremental 
calculations that always build on the same GPD model for excesses over that thresh- 
old. We would hope that there is some robustness to our inference for different 
choices of threshold. 
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Figure 7.6. The smooth curve through the points shows the estimated tail of the AT&T 
weekly percentage loss data using the estimator (7.21). Points are plotted at empirical tail 
probabilities calculated from empirical df. The vertical dotted lines show estimates of 99% 
VaR and expected shortfall. The other curves are used in the construction of confidence 
intervals. See Example 7.25 for full details. 


Example 7.26 (varying the threshold). In the case of the AT&T weekly loss data 
the influence of different thresholds is investigated in Figure 7.7. Given the impor- 
tance of the € parameter in determining the weight of the tail and the relationship 
between quantiles and expected shortfalls, we first show how estimates of € vary 
as we consider a series of thresholds that give us between 20 and 150 exceedances. 
In fact, the estimates remain fairly constant around a value of approximately 0.2; a 
symmetric 95% confidence interval constructed from the standard error estimate is 
also shown, and indicates how the uncertainty about the parameter value decreases 
as the threshold is lowered or the number of threshold exceedances is increased. 
Point estimates of the 99% VaR and expected shortfall estimates are also shown. 
The former remain remarkably constant around 12%, while the latter show modest 
variability that essentially tracks the variability of the € estimate. These pictures 
provide some reassurance that different thresholds do not lead to drastically different 
conclusions. We return to the issue of threshold choice again in Section 7.2.5. 
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Figure 7.7. (a) Estimate of £ for different thresholds u and numbers of exceedances Ny, 
together with a 95% confidence interval based on the standard error. (b) Associated point 
estimates of the 99% VaR (solid line) and expected shortfall (dotted line). See Example 7.26 
for commentary. 


7.2.4 The Hill Method 


The GPD method is not the only way to estimate the tail of a distribution and, as an 
alternative, we describe in this section the well-known Hill approach to modelling 
the tails of heavy-tailed distributions. 


Estimating the tail index. For this method we assume that the underlying loss 
distribution is in the maximum domain of attraction of the Fréchet distribution so 
that, by Theorem 7.8, it has a tail of the form 


F(x) = Lx", (7.22) 


for a slowly varying function L (see Definition 7.7) and a positive parameter a. 
Traditionally, in the Hill approach, interest centres on the tail index a, rather than 
its reciprocal £, which appears in (7.3). The goal is to find an estimator of a based 
on identically distributed data X1,..., Xn. 
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The Hill estimator can be derived in various ways (see EKM, pp. 330-336). 
Perhaps the most elegant is to consider the mean excess function of the generic 
logarithmic loss In X, where X is an rv with df (7.22). Writing e* for the mean 
excess function of In X and using integration by parts we find that 


e*(Inu) = E(n X — Inu | In X > Inu) 


1 CO 
= zl (nx — Inu) dF (x) 
Z 1 F F(x) dx 
Fi(u) Ju x 


1 [0,0] 
z) L(x)x~©* ax. 
F(u) Ju 


For u sufficiently large, the slowly varying function L(x) for x > u can essentially be 
treated as a constant and taken outside the integral. More formally, using Karamata’s 
Theorem (see Section A.1.3), we get, for u —> ov, 
L —q „—l1 
ae ae 
F(u) 
so limy—+o0 œe* (ln u) = 1. We expect to see similar tail behaviour in the sample 
mean excess function ež (see (7.16)) constructed from the log observations. That 
is, we expect that ež (In Xg n) © a! for n large and k sufficiently small, where 
Xn,n S +++ S X1,n are the order statistics as usual. Evaluating ež (In Xg, n) gives us 
the estimator @~! = ((k — 1)7! Se In X jn — In Xx,,). The standard form of the 
Hill estimator is obtained by a minor modification: 


k -1 
ay = G Soin Xjn—In Xan) , 2<k<n. (1.23) 
j=1 
The Hill estimator is one of the best-studied estimators in the EVT literature. The 
asymptotic properties (consistency, asymptotic normality) of this estimator (as sam- 
ple size n — oo, number of extremes k — œo and the so-called tail-fraction 
k/n — 0) have been extensively investigated under various assumed models for the 
data, including ARCH and GARCH (see Notes and Comments). We concentrate on 
the use of the estimator in practice and, in particular, on its performance relative to 
the GPD estimation approach. 

When the data are from a distribution with a tail that is close to a perfect power 
function, the Hill estimator is often a good estimator of œ, or its reciprocal €. In 
practice, the general strategy is to plot Hill estimates for various values of k. This 
gives the Hill plot {(k, ai) :k =2,...,n}. We hope to find a stable region in the 
Hill plot where estimates constructed from different numbers of order statistics are 
quite similar. 


Example 7.27 (Hill plots). We construct Hill plots for the Danish fire data of Exam- 
ple 7.23 and the weekly percentage loss data (positive values only) of Example 7.24 
(shown in Figure 7.8). 
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Figure 7.8. Hill plots showing estimates of the tail index a = 1/4 for (a), (b) the AT&T 
weekly percentages losses and (c), (d) the Danish fire loss data. Parts (b) and (d) are expanded 
versions of sections of (a) and (c) showing Hill estimates based on up to 60 order statistics. 


It is very easy to construct the Hill plot for all possible values of k, but it can be 
misleading to do so; practical experience (see Example 7.28) suggests that the best 
choices of k are relatively small—say 10-50 order statistics in a sample of size 1000. 
For this reason we have enlarged sections of the Hill plots showing the estimates 
obtained for values of k less than 60. 

For the Danish data the estimates of æ obtained are between 1.5 and 2, suggesting £ 
estimates between 0.5 and 0.67, all of which correspond to infinite-variance models 
for these data. Recall that the estimate derived from our GPD model in Example 7.23 
was Ê = 0.50. For the AT&T data there is no particularly stable region in the plot. 
The « estimates based on k = 2,..., 60 order statistics mostly range from 2 to 4, 
suggesting a € value in the range 0.25-0.5, which is larger than the values estimated 
in Example 7.26 with a GPD model. 
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Example 7.27 shows that the interpretation of Hill plots can be difficult. In prac- 
tice, various deviations from the ideal situation can occur. If the data do not come 
from a distribution with a regularly varying tail, the Hill method is really not appro- 
priate and Hill plots can be very misleading. Serial dependence in the data can 
also spoil the performance of the estimator, although this is also true for the GPD 
estimator. EKM contains a number of Hill “horror plots” based on simulated data 
illustrating the issues that arise (see Notes and Comments). 


Hill-based tail estimates. For the risk-management applications of this book we 
are less concerned with estimating the tail index of heavy-tailed data and more 
concerned with tail and risk-measure estimates. We give a heuristic argument for a 
standard tail estimator based on the Hill approach. We assume a tail model of the 
form F(x) = Cx“, x > u > 0, for some high threshold u; in other words, we 
replace the slowly varying function by a constant for sufficiently large x. For an 
appropriate value of k the tail index « is estimated by a and the threshold u is 
replaced by Xx,n (or X(k+1),n in Some versions); it remains to estimate C. Since 
C can be written as C = u“ F (u), this is equivalent to estimating F (u), and the 
obvious empirical estimator is k/n (or (k — 1)/n in some versions). Putting these 
ideas together gives us the Hill tail estimator in its standard form: 


a (H) 


ros “( a | am (7.24) 


n Xk,n 


Writing the estimator in this way emphasizes the way it is treated mathematically. For 
any pair k and n, both the Hill estimator and the associated tail estimator are treated 
as functions of the k upper order statistics from the sample of size n. Obviously it is 
possible to invert this estimator to get a quantile estimator and it is also possible to 
devise an estimator of expected shortfall using arguments about regularly varying 
tails. 

The GPD-based tail estimator (7.21) is usually treated as a function of a random 
number N, of upper order statistics for a fixed threshold u. The different presentation 
of these estimators in the literature is a matter of convention and we can easily recast 
both estimators in a similar form. Suppose we rewrite (7.24) in the notation of (7.21) 
by substituting Ê (H) u and N, for 1 ee Xk,n and k, respectively. We get 


ae eee 
FOS Me (14 805 £) , 
n Ey 


This estimator lacks the additional scaling parameter 6 in (7.21) and tends not to 
perform as well, as is shown in simulated examples in the next section. 


7.2.5 Simulation Study of EVT Quantile Estimators 


First we consider estimation of € and then estimation of the high quantile VaRq. In 
both cases estimators are compared using mean squared errors (MSEs); we recall 
that the MSE of an estimator 6 of a parameter 0 is given by MSE(6) =E (6 —0} = 
(E (6 — 0)? + var (ô), and thus has the well-known decomposition into squared 
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Figure 7.9. Comparison of (a) estimated MSE, (b) bias and (c) variance for the Hill (dotted 
line) and GPD (solid line) estimators of £, the reciprocal of the tail index, as a function of k 
(or Nu), the number of upper order statistics from a sample of 1000 t-distributed data with 
four degrees of freedom. See Example 7.28 for details. 


bias plus variance. A good estimator should keep both the bias term E (6 — 0) and 
the variance term var (6 ) small. 

Since analytical evaluation of bias and variance is not possible, we calculate Monte 
Carlo estimates by simulating 1000 datasets in each experiment. The parameters of 
the GPD are determined in all cases by ML; PWM, the main alternative, gives 
slightly different results, but the conclusions are similar. 

We calculate estimates using the Hill method and the GPD method based on dif- 
ferent numbers of upper order statistics (or differing thresholds) and try to determine 
the choice of k (or N,,) that is most appropriate for a sample of size n. In the case 
of estimating VaR we also compare the EVT estimators with the simple empirical 
quantile estimator. 


Example 7.28 (Monte Carlo experiment). We assume that we have a sample of 
1000 iid data from a ¢ distribution with four degrees of freedom and want to esti- 
mate &, the reciprocal of the tail index, which in this case has the true value 0.25. 
(This is demonstrated in Example 7.29 at the end of this chapter.) The Hill esti- 


mate is constructed for k values in the range {2, ..., 200} and the GPD estimate is 
constructed for k (or N,,) values in {30, 40, 50, ..., 400}. The results are shown in 
Figure 7.9. 


The ¢ distribution has a well-behaved regularly varying tail and the Hill estima- 
tor gives better estimates of £ than the GPD method, with an optimal value of k 
around 20-30. The variance plot shows where the Hill method gains over the GPD 
method; the variance of the GPD-based estimator is much higher than that of the 
Hill estimator for small numbers of order statistics. The magnitudes of the biases 
are closer together, with the Hill method tending to overestimate € and the GPD 
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Figure 7.10. Comparison of (a) estimated MSE, (b) bias and (c) variance for the Hill (dotted 
line) and GPD (solid line) estimators of VaRọ.99, as a function of k (or N,,), the number of 
upper order statistics from a sample of 1000 t-distributed data with four degrees of freedom. 
Dashed line also shows results for the (threshold-independent) empirical quantile estimator. 


See Example 7.28 for details. 


method tending to underestimate it. If we were to use the GPD method, the optimal 
choice of threshold would be one giving 100-150 exceedances. 

The conclusions change when we attempt to estimate the 99% VaR; the results are 
shown in Figure 7.10. The Hill method has a negative bias for low values of k but a 
rapidly growing positive bias for larger values of k; the GPD estimator has a positive 
bias that grows much more slowly; the empirical method has a negative bias. The 
GPD attains its lowest MSE value for a value of k around 100, but, more importantly, 
the MSE is very robust to the choice of k because of the slow growth of the bias. 
The Hill method performs well for 20 < k < 75 (we only use k values that lead to a 
quantile estimate beyond the effective threshold X;,,) but then deteriorates rapidly. 
Both EVT methods obviously outperform the empirical quantile estimator. Given 
the relative robustness of the GPD-based tail estimator to changes in k, the issue of 
threshold choice for this estimator seems less critical than for the Hill method. 


7.2.6 Conditional EVT for Financial Time Series 


The GPD method when applied to threshold exceedances in a financial return series 
(as in Examples 7.24 and 7.25) is essentially an unconditional method for estimating 
the tail of the P&L distribution and associated risk measures. In Chapter 2 we argued 
that a conditional risk-measurement approach may be more appropriate for short 
time horizons, and in Section 2.3.6 we observed that this generally led to better 
backtesting results. We now consider a simple adaptation of the GPD method to 
obtain conditional risk-measure estimates in a time series context. This adaptation 
uses the GARCH model and related ideas in Chapter 4. 

We assume in particular that we are in the framework of Section 4.4.2 so that 
., L; are negative log-returns generated by a strictly stationary time series 


Ly-n+1 Le 
process (L+). This process is assumed to be of the form L; = ur + 0; Z, where ur 
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and o; are ¥;_,-measurable and (Z+) are iid innovations with some unknown df G; 
an example would be an ARMA model with GARCH errors. To obtain estimates of 
the risk measures 


VaR}, = Miti + 07414a(Z), ES) = Mi+1 + 0741 ESa(Z), 


we first fita GARCH model by the QML procedure of Section 4.3.4 (since we do not 
assume a particular innovation distribution) and use this to estimate j4;41 and o;+1. 
As an alternative we could use EWMA volatility forecasting instead. To estimate 
da(Z) and ES,(Z) we essentially apply the GPD tail estimation procedure to the 
innovation distribution G. To get round the problem that we do not observe data 
directly from the innovation distribution, we treat the residuals from the GARCH 
analysis as our data and apply the GPD tail estimation method of Section 7.2.3 to the 
residuals. In particular, we estimate qa (Z) and ES, (Z) using the VaR and expected 
shortfall formulas in (7.18) and (7.19). 

In Section 2.3.6 it was shown that this method gives good VaR estimates; in that 
example the sample size was taken to be n = 1000 and the threshold was always 
set so that there were 100 exceedances. In fact, the method also gives very good 
conditional expected shortfall estimates, as is shown in the original paper of McNeil 
and Frey (2000). 


Notes and Comments 


The ideas behind the important Theorem 7.20, which underlies GPD modelling, may 
be found in Pickands (1975) and Balkema and de Haan (1974). Important papers 
developing the technique in the statistical literature are Davison (1984) and Davison 
and Smith (1990). The estimation of the parameters of the GPD, both by ML and by 
the method of probability-weighted moments, is discussed in Hosking and Wallis 
(1987). The tail estimation formula (7.21) was suggested by Smith (1987) and the 
theoretical properties of this estimator for iid data in the domain of attraction of an 
extreme value distribution are extensively investigated in this paper. The Danish fire 
loss example is taken from McNeil (1997). 

The Hill estimator goes back to Hill (1975) (see also Hall 1982). The theoretical 
properties for dependent data, including linear processes with heavy-tailed innova- 
tions and ARCH and GARCH processes, were investigated by Resnick and Starica 
(1995, 1996). The idea of smoothing the estimator is examined in Resnick and Starica 
(1997) and Resnick (1997). For Hill “horror plots”, showing situations when the Hill 
estimator delivers particularly poor estimates of the tail index, see EKM, pp. 194, 
270 and 343. 

Alternative estimators based on order statistics include the estimator of Pickands 
(1975), which is also discussed in Dekkers and de Haan (1989), and the DEdH 
estimator of Dekkers, Einmahl and de Haan (1989). This latter estimator is used as 
the basis of a quantile estimator in de Haan and Rootzén (1993). Both the Pickands 
and DEdH estimators are designed to estimate general £ in the extreme value limit (in 
contrast to the Hill estimator, which is designed for positive €); in empirical studies 
the DEdH estimator seems to work better than the Pickands estimator. The issue of 
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the optimal number of order statistics in such estimators is taken up in a series of 
papers by Dekkers and de Haan (1993) and Danielsson et al. (2001). A method is 
proposed which is essentially based on the bootstrap approach to estimating mean 
squared error discussed in Hall (1990). A review paper relevant for applications to 
insurance and finance is Matthys and Beirlant (2000). 

Analyses of the tails of financial data using methods based on the Hill estimator 
can be found in Koedijk, Schafgans and de Vries (1990), Lux (1996) and various 
papers by Danielsson and de Vries (1997a,b,c). The conditional EVT method was 
developed in McNeil and Frey (2000); a Monte Carlo method using the GPD model 
to estimate risk measures for the h-day loss distribution is also described. See also 
Gençay, Selcuk and Ulugiilyagci (2003) and Gençay and Selcuk (2004) for inter- 
esting applications of EVT methodology to VaR estimation. 


7.3 Tails of Specific Models 


In this short section we survey the tails of some of the more important distributions 
and models that we have encountered in this book. 


7.3.1 Domain of Attraction of Fréchet Distribution 


As stated in Section 7.1.2, the domain of attraction of the Fréchet distribution consists 
of distributions with regularly varying tails of the form F (x) = x7“ L(x) fora > 0, 
where œ is known as the tail index. These are heavy-tailed models where higher- 
order moments cease to exist. Normalized maxima of random samples from such 
distributions converge to a Fréchet distribution with shape parameter £ = 1/a, 
and excesses over sufficiently high thresholds converge to a generalized Pareto 
distribution with shape parameter £ = 1/a. 

We now show that the Student ¢ distribution and the inverse gamma distribution 
are in this class; we analyse the former because of its general importance in financial 
modelling and the latter because it appears as the mixing distribution that yields the 
Student ¢ in the class of normal variance mixture models (see Example 3.7). In 
Section 7.3.3 we will see that the mixing distribution in a normal variance mixture 
model essentially determines the tail of that model. 

Both the ¢ and inverse gamma distributions are presented in terms of their density, 
and the analysis of their tails proves to be a simple application of a useful result 
known as Karamata’s Theorem, which is given in Section A.1.3. 


Example 7.29 (Student ¢ distribution). It is easily verified that the standard uni- 
variate ¢ distribution with v > 1 has a density of the form f,(x) = a MTD LG: 
Hence Karamata’s Theorem (see Theorem A.5) allows us to calculate the form of 
the tail F,(x) = J a fo(y) dy by essentially treating the slowly varying function as 
a constant and taking it out of the integral. We get 

[0.0] 


Fu) = | y OL dyA wi -L@), x 00, 
x 


from which we conclude that the df F, of a ¢ distribution has tail index v and 
F, € MDA(H1/,) by Theorem 7.8. 
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Example 7.30 (inverse gamma distribution). The density of the inverse gamma 
distribution is given in (A.11). It is of the form fy g(x) =x ~@+L(x), since 
exp(—B/x) — 1 as x — oo. Using the same technique as in Example 7.29, we 
deduce that this distribution has tail index a, so Fy,g € MDA (HĦ1/a). 


7.3.2 Domain of Attraction of Gumbel Distribution 


A mathematical characterization of the Gumbel class is that it consists of the 
so-called von Mises distribution functions and any other distributions which 
are tail equivalent to von Mises distributions (see EKM, pp. 138-150). We 
give the definitions of both of these concepts below. Note that distributions 
in this class can have both infinite and finite right endpoints; again we write 
Xp = sup{x € R: F(x) < 1} < œ for the right endpoint of F. 


Definition 7.31 (von Mises distribution function). Suppose there exists some 
z < xp such that F has the representation 


a(t) 


where c is some positive constant, a(t) is a positive and absolutely continuous 
function with density a’, and lim,-,,, a’(x) = 0. Then F is called a von Mises 
distribution function. 


= ae | 
Pes) =cexo|- f att, Z<X < XF, 
Z 


Definition 7.32 (tail equivalence). Two dfs F and G are called tail equivalent if 
they have the same right endpoints (i.e. x7 = xg) and limy.,, F(x)/G(x) = c for 
some constant 0 < c < œ. 


To decide whether a particular df F is a von Mises df, the following condition is 
extremely useful. Assume there exists some z < xp such that F is twice differen- 
tiable on (z, xF) with density f = F’ and F” < Oin (z, xr). Then F is a von Mises 
df if and only if 

n 
OTA = -l1. (7.25) 
xorr fI) 


We now use this condition to show that the gamma df is a von Mises df. 


Example 7.33 (gamma distribution). The density f = fa g of the gamma 
distribution is given in (A.7), and a straightforward calculation yields F”(x) = 
f'(x) = -fB + (A — a@)/x) < 0, provided x > max((a — 1)/B,0). 
Clearly, limy F”(x)/f (x) = —B. Moreover, using L’H6pital’s rule we get 
limy-0 F(x)/f (x) = limy +o —f(x)/f'(x) = po Combining these two limits 
establishes (7.25). 


Example 7.34 (GIG distribution). The density of an rv X ~ N7 (à, x, Y) with 
the GIG distribution is given in (A.8). Let Fy, x,y (x) denote the df and consider the 
case where y > 0. If y = 0, then the GIG is an inverse gamma distribution, which 
was shown in Example 7.30 to be in the Fréchet class. If y > 0, then A > 0, and 
a similar technique to Example 7.33 could be used to establish that the GIG is a 
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von Mises df. In the case where à > 0 it is easier to demonstrate tail equivalence 
with a gamma distribution, which is the special case when x = 0. We observe that 


F 
lim axy O) — lim faxy ARE 


x—> 0 Fy oy Œ) ~~ koo fao y œ) 
for some constant c), yy. It follows that Fy, yy € MDA(Apo). 


7.3.3 Mixture Models 


In this book we have considered a number of models for financial risk-factor changes 
that arise as mixtures of rvs. In Chapter 3 we introduced multivariate normal variance 
mixture models including the Student t, and (symmetric) generalized hyperbolic 
distributions, which have the general structure given in (3.19). A one-dimensional 
normal variance mixture (or the marginal distribution of a d-dimensional normal 
variance mixture) is of the same type (see Section A.1.1) as an rv X satisfying 


xiJwz, (7.26) 


where Z ~ N(O, 1) and W is an independent, positive-valued scalar rv. We would 
like to know more about the tails of distributions satisfying (7.26). 

More generally, to understand the tails of the marginal distributions of elliptical 
distributions it suffices to consider spherical distributions, which have the stochastic 
representation 


xirs (7.27) 


for a random vector S that is uniformly distributed on the unit sphere 4%! = 
{s € R¢:s's = 1}, and an independent radial variate R (see Section 3.3.1 and 
Theorem 3.22 in particular). Again we would like to know more about the tails of 
the marginal distributions of the vector X in (7.27). 

In Section 4.3 of Chapter 4 we considered strictly stationary stochastic pro- 
cesses (X;), such as GARCH processes satisfying equations of the form 


Xr = Oi Zt, (7.28) 


where (Z;) are strict white noise innovations, typically with a Gaussian or (more 
realistically) a scaled Student ¢ distribution, and o; is a ¥;_,-measurable rv repre- 
senting volatility. These models can also be seen as mixture models and we would 
like to know something about the tail of the stationary distribution of (X;). 

A useful result for analysing the tails of mixtures is the following theorem due to 
Breiman (1965), which we immediately apply to spherical distributions. 


Theorem 7.35 (tails of mixture distributions). Let X be given by X = YZ for 
independent, non-negative rvs Y and Z such that 


(1) Y has a regularly varying tail with tail index a; 


(2) E(Z°**) < o for some € > 0. 
Then X has a regularly varying tail with tail index a and 
P(X >x)~ E(Z°%)PY>x), x> œ. 
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Proposition 7.36 (tails of spherical distributions). Let X {RS ~ Sa(w) have 
a spherical distribution. If R has a regularly varying tail with tail index a, then so 
does |X;| fori = 1,...,d. If E(R*) < œœ for allk > 0, then |X;| does not have a 
regularly varying tail. 


Proof: Suppose that R has a regularly varying tail with tail index œ and consider 
RS;. Since |S;| is a non-negative rv with finite support [0, 1] and finite moments, it 
follows from Theorem 7.35 that R|S;|, and hence |X;|, are regularly varying with 
tail index a. If E(R*) < oo for all k > 0, then E|X;|* < oo for all k > 0, so that 
|X; | cannot have a regularly varying tail. 


Example 7.37 (tails of normal variance mixtures). Suppose that X = 1 /W Z with 
Z ~ Na(O, I4) and W an independent scalar rv, so that both Z and X have spherical 
distributions and X has a normal variance mixture at bunon: The vector Z has the 
spherical representation 7 RS, where R? ~ x3 q (see Example 3.24). The vector 
X has the spherical representation xR S, where R = q JWR. 

Now, the chi-squared distribution (being a gamma distribution) is in the domain of 
attraction of the Gumbel distribution, so E(R*) = E((R2)k/?) < oo forall k > 0. 
We first consider the case when W has a regularly varying tail with tail index «œ so that 
Fw(w) = L(w)w™®. It follows that P(VW > x) = P(W > x?) = Lo(x)x72%, 
where L2(x) := L(x?) is also slowly varying, so that V/W has a regularly varying 
tail with tail index 2~. By Theorem 7.35, R = 1 /WŘ WR also has a regularly varying 
tail with tail index 2 and, by Proposition 7.36, so do the components of |X|. 

To consider a particular case, suppose that W ~ Ig(v, $v), so that, by Exam- 
ple 7.30, W is regularly varying with tail index 4v. Then JW has a regularly 
varying tail with tail index v and so does |X;|; this is hardly surprising because 
X ~ ta(v, 9, I4), implying that X; has a univariate Student ¢ distribution with v 
degrees of freedom, and we already know from Example 7.29 that this has tail 
index v. 

On the other hand, if Fy € MDA(Họo), then E(R*) < œ for all k > 0 and |X;| 
cannot have a regularly varying tail by Proposition 7.36. This means, for example, 
that univariate generalized hyperbolic distributions do not have power tails (except 
for the special boundary case corresponding to Student t) because the GIG is in 
the maximum domain of attraction of the Gumbel distribution, as was shown in 
Example 7.34. 


Analysis of the tails of the stationary distribution of GARCH-type models is more 
challenging. In view of Theorem 7.35 and the foregoing examples, it is clear that 
when the innovations (Z;) are Gaussian, then the law of the process (X;) in (7.28) 
will have a regularly varying tail if the volatility o; has a regularly varying tail. 
Mikosch and Starica (2000) analyse the GARCH(1, 1) model (see Definition 4.20), 
where the squared volatility satisfies o = = ao + aX? y+ Bo? 1- They show 
that under relatively weak conditions on the innovation distribution of (Z;), the 
volatility o; has a regularly varying tail with tail index x given by the solution of 
the equation 

E((aZ? +p) = 1. (7.29) 
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Table 7.2. Approximate theoretical values of the tail index « solving (7.29) for various 
GARCH<(1, 1) processes with Gaussian and Student ¢ innovation distributions. 


t distribution 


Parameters Gauss v=8 v=4 


aj}=0.2, B=0.75 44 3.5 2.7 
aj=0.1, B=085 91 58 3.4 
aj; = 0.05, 6= 0.95 211 7.9 3.9 


In Table 7.2 we have calculated approximate values of « for various innovation dis- 
tributions and parameter values using numerical integration and root-finding pro- 
cedures. By Theorem 7.35 these are the values of the tail index for the stationary 
distribution of the GARCH(1, 1) model itself. 

Two main findings are obvious: for any fixed set of parameter values, the tail 
index gets smaller and the tail of the GARCH model gets heavier as we move to 
heavier-tailed innovation distributions; for any fixed innovation distribution, the tail 
of the GARCH model gets lighter as we decrease the ARCH effect (a1) and increase 
the GARCH effect (6). 


Tail dependence in elliptical distributions. We close this section by giving a result 
that reveals an interesting connection between tail dependence in elliptical distribu- 
tions and regular variation of the radial rv R in the representation X £ w+ RAS 
of an elliptically symmetric distribution given in Proposition 3.28. 


Theorem 7.38. Let X 2 M+ RAS ~ Eal, X, Y), where u, R, A and S are as 
in Proposition 3.28 and we assume that oii > O for alli = 1,...,d. If R has a 
regularly varying tail with tail index a > 0, then the coefficient of upper and lower 
tail dependence between X; and X ; is given by 

m/2 
Se i2-arcsin pij) cos” (t) dt 


ier cos” (t) dt 


Xj, Xj) = 


(7.30) 


where pij is the (i, j)th element of P = (X) and @ is the correlation operator 
defined in (3.5). 


An example where R has a regularly varying tail occurs in the case of the multi- 
variate ¢ distribution X ~ tg(v, m, X). It is obvious from the arguments used in 
Example 7.37 that the tail of the df of R is regularly varying with tail index a = v. 
Thus (7.30) with a replaced by v gives an alternative expression to (5.31) for cal- 
culating tail-dependence coefficients for the t copula C} p. 

Arguably, the original expression (5.31) is easier to work with, since the df of 
a univariate ¢ distribution is available in statistical software packages. Moreover, 
the equivalence of the two formulas allows us to conclude that we can use (5.31) 
to evaluate tail-dependence coefficients for any bivariate elliptical distribution with 
correlation parameter o when the distribution of the radial rv R has a regularly 
varying tail with tail index v. 
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Notes and Comments 


Section 7.3 has been a highly selective account tailored to the study of a number 
of very specific models, and all of the theoretical subjects touched upon—regular 
variation, von Mises distributions, tails of products, tails of stochastic recurrence 
equations—can be studied in much greater detail. 

For more about regular variation, slow variation and Karamata’s Theorem see 
Bingham, Goldie and Teugels (1987) and Seneta (1976). A summary of the more 
important ideas with regard to the study of extremes is found in Resnick (1987). 
Section 7.3.2, with the exception of the examples, is taken from EKM, and detailed 
references to results on von Mises distributions and the maximum domain of attrac- 
tion of the Gumbel distribution are found therein. 

Theorem 7.35 follows from results of Breiman (1965). Related results on distri- 
butions of products are found in Embrechts and Goldie (1980). The discussion of 
tails of GARCH models is based on Mikosch and Starica (2000); the theory involves 
the study of stochastic recurrence relations and is essentially due to Kesten (1973). 
See also Mikosch (2003) for an excellent introduction to these ideas. 

The formula for tail-dependence coefficients in elliptical distributions when the 
radial rv has a regularly varying tail is taken from Hult and Lindskog (2002). Similar 
results were derived independently by Schmidt (2002); see also Frahm, Junker and 
Szimayer (2003) for a discussion of the applicability of such results to financial 
returns. 


7.4 Point Process Models 


In our discussion of threshold models in Section 7.2 we considered only the magni- 
tude of excess losses over high thresholds. In this section we consider exceedances 
of thresholds as events in time and use a point process approach to model the occur- 
rence of these events. We begin by looking at the case of regularly spaced iid data 
and discuss the well-known peaks-over-threshold (POT) model for the occurrence 
of extremes in such data; this model elegantly subsumes the models for maxima and 
the GPD models for excess losses that we have so far described. 

However, the assumptions of the standard POT model are typically violated 
by financial return series, because of the kind of serial dependence that volatil- 
ity clustering generates in such data. Our ultimate aim is to find more general 
point process models to describe the occurrence of extreme values in financial time 
series, and we find suitable candidates in the class of self-exciting point processes. 
These models are of a dynamic nature and can be used to estimate conditional 
VaRs; they offer an interesting alternative to the conditional EVT approach of Sec- 
tion 7.2.6 with the advantage that no prewhitening of data with GARCH processes 
is required. 

The following section gives an idea of the theory behind the POT model, but may 
be skipped by readers who are content to go directly to a description of the standard 
POT model in Section 7.4.2. 
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7.4.1 Threshold Exceedances for Strict White Noise 


Consider a strict white noise process (X;);-N representing financial losses. While we 
discuss the theory for iid variables for simplicity, the results we describe also hold for 
dependent processes with extremal index 0 = 1, i.e. processes where extreme values 
show no tendency to cluster (see Section 7.1.3 for examples of such processes). 

Throughout this section we assume that the common loss distribution is in the 
maximum domain of attraction of an extreme value distribution (MDA(H; )) so 
that (7.1) holds for the non-degenerate limiting distribution Hg and normaliz- 
ing sequences cn and d,. From (7.1) it follows, by taking logarithms and using 
In(1 — y) ~ —y as y > 0, that for any fixed x we have 


lim nIn(1 — F(enx + dn)) = In He (x), 

SPRS A (7.31) 
lim nF (cnx + dn) = — ln Hg (x). 
n—>oo 


Throughout this section we also consider a sequence of thresholds (u,(x)) defined 
by un(x) := Cnx + dn for some fixed value of x. Clearly, (7.31) implies that we 
have nF (un (x)) > — 1n Hg (x) as n —> oo for this sequence of thresholds. 

The number of losses in the sample X1, ..., Xn exceeding the threshold un (x) is 
a binomial rv, Ny,(x) ~ B(n, F(un(x))), with expectation n F (un (x)). Since (7.31) 
holds, the standard Poisson limit result implies that, as n — ov, the number 
of exceedances N,,,(x) converges to a Poisson rv with mean A(x) = — In H¢ (x), 
depending on the particular x chosen. 

The theory goes further. Not only is the number of exceedances asymptotically 
Poisson, these exceedances occur according to a Poisson point process. To state the 
result it is useful to give a brief summary of some ideas concerning point processes. 


On point processes. Suppose we have a sequence of rvs or vectors Yj,..., Yn 
taking values in some state space X (for example, R or RÊ?) and we define, for any 
set A C X, the rv 


n 
N(A) = È Iye) (1.32) 
i=1 


which counts the random number of Y; in the set A. Under some technical conditions 
(see EKM, pp. 220-223), (7.32) is said to define a point process N(-). An example 
of a point process is the Poisson point process. 


Definition 7.39 (Poisson point process). The point process N (-) is called a Poisson 
point process (or Poisson random measure) on X with intensity measure A if the 
following two conditions are satisfied. 


(a) For A C X andk > 0, 


P(N(A) =k) = 
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(b) For any m > 1, if Aj,..., Am are mutually disjoint subsets of X, then the 
tvs N(Aj),..., N(Am) are independent. 


The intensity measure A(-) of N(-) is also known as the mean measure because 
E(N(A)) = A(A). We also speak of the intensity function (or simply inten- 
sity) of the process, which is the derivative A(x) of the measure satisfying 
A(A) = Ta A(x) dx. 


Asymptotic behaviour of the point process of exceedances. Consider again the 
strict white noise (X;);eņ and sequence of thresholds un (x) = cnx + dn for some 
fixed x. Forn € N and 1 <i < n let Yin = (i/n)lyXx;>u,(@œ)} and observe that Y; n 
can be thought of as returning either the normalized “time” i/n of an exceedance, 
or zero. The point process of exceedances of the threshold un is the process N;,(-) 
with state space X = (0, 1] given by 


n 
N, (A) = se Ty; ,€A} (7.33) 
i=l 


for A C X. As the notation indicates, we consider this process to be an element 
in a sequence of point processes indexed by n. The point process (7.33) counts 
the exceedances with time of occurrence in the set A and we are interested in the 
behaviour of this process as n —> oo. 

It may be shown (see Theorem 5.3.2 in EKM) that N, (-) converges in distribution 
on X to a Poisson process N (-) with intensity measure A(-) satisfying A(A) = 
(t2—t )A(x) for A = (t1, t2) C X, where à (x) = — In Hg (x) as before. This implies, 
in particular, that E(N,(A)) > E(N(A)) = A(A) = (t — t1)A(x). Clearly, the 
intensity does not depend on time and takes the constant value A := A(x); we refer 
to the limiting process as a homogeneous Poisson process with intensity or rate A. 


Application of the result in practice. We give a heuristic argument explaining how 
this limiting result is used in practice. We consider a fixed large sample size n and a 
fixed high threshold u, which we assume satisfies u = cny + dn for some value y. 
We expect that the number of threshold exceedances can be approximated by a 
Poisson rv and that the point process of exceedances of u can be approximated by a 
homogeneous Poisson process with rate A = — In Hg (y) = — In Hg ((u—dn)/cn). If 
we replace the normalizing constants c, and d, by o > 0 and u, we have a Poisson 
process with rate — In Hg „,o (u). Clearly, we could repeat the same argument with 
any high threshold so that, for example, we would expect it to be approximately true 
that exceedances of the level x > u occur according to a Poisson process with rate 
— In Ag 5 (x). 

We thus have an intimate relationship between the GEV model for block maxima 
and a Poisson model for the occurrence in time of exceedances of a high threshold. 
The arguments of this section thus provide theoretical support for the observation in 
Figure 4.3: that exceedances for simulated iid t data are separated by waiting times 
that behave like iid exponential observations. 
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7.4.2 The POT Model 


The theory of the previous section combined with the theory of Section 7.2 suggests 
an asymptotic model for threshold exceedances in regularly spaced iid data (or data 
from a process with extremal index 6 = 1). The so-called POT model makes the 
following assumptions. 


e Exceedances occur according to a homogeneous Poisson process in time. 


e Excess amounts above the threshold are iid and independent of exceedance 
times. 


e The distribution of excess amounts is generalized Pareto. 


There are various alternative ways of describing this model. It might also be called a 
marked Poisson point process, where the exceedance times constitute the points 
and the GPD-distributed excesses are the marks. It can also be described as a 
(non-homogeneous) two-dimensional Poisson point process, where points (t, x) 
in two-dimensional space record times and magnitudes of exceedances. The latter 
representation is particularly powerful, as we now discuss. 


Two-dimensional Poisson formulation of POT model. Assume that we have reg- 
ularly spaced random losses X1,..., Xn and that we set a high threshold u. We 
assume that, on the state space X = (0, 1] x (u, 00), the point process defined by 
N(A) = yet 1 /n,X;)¢A} İs a Poisson process with intensity at a point (t, x) given 
by 


1 x—p a 
A(t, x) = H(i +E ) ; (1.34) 


o 


provided (1+&(x—)/o) > 0,andby A(t, x) = 0 otherwise. Note that this intensity 
does not depend on ¢ but does depend on x, and hence the two-dimensional Poisson 
process is non-homogeneous; we simplify the notation to A(x) := A(t, x). For a set 
of the form A = (t1, t2) x (x, 00) C X, the intensity measure is 


th oe) 
A(A) = i, f A) dy dt = — (h — t1) In Ag po (x). (7.35) 
ti x 


It follows from (7.35) that for any x > u the implied one-dimensional process of 
exceedances of the level x is a homogeneous Poisson process with rate t(x) := 
— In Hg uo (x). Now consider the excess amounts over the threshold u. The tail of 
the excess df over the threshold u, denoted F, (x) before, can be calculated as the 
ratio of the rates of exceeding the levels u + x and u. We obtain 


= o tuUtx) Ex ate - 
fea oy (sree) ~ See) 


for a positive scaling parameter 8 = ø + (u — u). This is precisely the tail of 
the GPD model for excesses over the threshold u used in Section 7.2.2. Thus this 
seemingly complicated model is indeed the POT model described informally at the 
beginning of this section. 
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Note also that the model implies the GEV distributional model for maxima. To 
see this, consider the event that {M@, < x} for some value x > u. This may be 
expressed in point process language as the event that there are no points in the set 
A = (0, 1] x (x, 00). The probability of this event is calculated to be P(M, < x) = 
P(N(A) = 0) = exp(—A(A)) = Ht uo (x), x > u, which is precisely the GEV 
model for maxima of n-blocks used in Section 7.1.4. 


Statistical estimation of the POT model. The most elegant way of fitting the POT 
model to data is to fit the point process with intensity (7.34) to the exceedance data 


in one step. Given the exceedance data {XxX ji j=l1,..., Nu}, the likelihood can be 
written as 
Nu 
L(&,o, u; X1,..., Xw,) = exp(—t(u)) I] Xj). (7.36) 


j=l 
Parameter estimates of £, o and u are obtained by maximizing this expression, 
which is easily accomplished by numerical means. For literature on the derivation 
of this likelihood, see Notes and Comments. 

There are, however, simpler ways of getting the same parameter estimates. Sup- 
pose we reparametrize the POT model in terms of t := t (u) = — In H; „,o (u), the 
rate of the one-dimensional Poisson process of exceedances of the level u, and 
B =o +&(u — u), the scaling parameter of the implied GPD for the excess losses 
over u. Then the intensity in (7.34) can be rewritten as 


T x—u\ 4 
A(x) = A(t, x) = =] 1 ? 7.37 
(x) (t, x) =( FE 3 ) (7.37) 


where € € R and t, 6 > 0. Using this parametrization it is easily verified that the 
log of the likelihood in (7.36) becomes 


In LẸ, o, u; X1,..., Xv,) = ln Li, B; X; —u,..., Xn, — u) + In Lot; Ny), 


u 


where L is precisely the likelihood for fitting a GPD to excess losses given in (7.14) 
and In L2 (T; Ny) = —T + N, Int, which is the log-likelihood for a one-dimensional 
homogeneous Poisson process with rate t. Such a partition of a log-likelihood into a 
sum of two terms involving two different sets of parameters means that we can make 
separate inferences about the two sets of parameters; we can estimate £ and £ ina 
GPD analysis and then estimate t by its MLE N, and use these to infer estimates 
of u and o. 


Advantages of the POT model formulation. One might ask what the advantages of 
approaching the modelling of extremes through the two-dimensional Poisson point 
process model described by the intensity (7.34) could be? One advantage is the fact 
that the parameters £, u and o in the Poisson point process model do not have any 
theoretical dependence on the threshold chosen, unlike the parameter £ in the GPD 
model, which appears in the theory as a function of the threshold u. In practice, we 
would expect the estimated parameters of the Poisson model to be roughly stable 
over a range of high thresholds, whereas the estimated parameter varies with 
threshold choice. 
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For this reason the intensity (7.34) is a framework that is often used to introduce 
covariate effects into extreme value modelling. One method of doing this is to replace 
the parameters u and o in (7.34) by parameters that vary over time as a function 
of deterministic covariates. For example, we might have u(t) = a + y’ y(t), where 
y(t) represents a vector of covariate values at time t. This would give us Poisson 
processes that are also non-homogeneous in time. 


Applicability of the POT model to return series data. We now turn to the use of 
the POT model with financial return data. An initial comment is that returns do 
not really form genuine point events in time, in contrast to recorded water levels 
or wind speeds, for example. Returns are discrete-time measurements that describe 
changes in value taking place over the course of, say, a day or a week. Nonetheless, 
we assume that if we take a longer-term perspective, such data can be approximated 
by point events in time. 

In Section 4.1.1 and in Figure 4.3 in particular, we saw evidence that, in contrast 
to iid data, exceedances of a high threshold for daily financial return series do not 
necessarily occur according to a homogeneous Poisson process. They tend instead 
to form clusters corresponding to episodes of high volatility. Thus the standard POT 
model is not directly applicable to financial return data. 

Theory suggests that for stochastic processes with extremal index 6 < 1, such 
as GARCH processes, the extremal clusters themselves should occur according to 
a homogeneous Poisson process in time, so that the individual exceedances occur 
according to a Poisson cluster process (see, for example, Leadbetter 1991). Thus 
a suitable model for the occurrence and magnitude of exceedances in a financial 
return series might be some form of marked Poisson cluster process. 

Rather than attempting to specify the mechanics of cluster formation, it is quite 
common to try to circumvent the problem by declustering financial return data: we 
attempt to formally identify clusters of exceedances and then we apply the POT 
model to cluster maxima only. This method is obviously somewhat ad hoc, as there 
is usually no clear way of deciding where one cluster ends and another begins. A 
possible declustering algorithm is given by the runs method. In this method a run 
size r is fixed and two successive exceedances are said to belong to two different 
clusters if they are separated by a run of at least r values below the threshold (see 
EKM, pp. 422-424). In Figure 7.11 the DAX daily negative returns of Figure 4.3 
have been declustered with a run length of 10 trading days; this reduces the 100 
exceedances to 42 cluster maxima. 

However, it is not clear that applying the POT model to declustered data gives us 
a particularly useful model. We can estimate the rate of occurrence of clusters of 
extremes and say something about average cluster size; we can also derive a GPD 
model for excess losses over thresholds for cluster maxima (where standard errors 
for parameters may be more realistic than if we fitted the GPD to the dependent 
sample of all threshold exceedances). However, by neglecting the modelling of 
cluster formation, we cannot make more dynamic statements about the intensity of 
occurrence of threshold exceedances at any point in time. In the next section we will 
describe point process models that attempt to do just that. 
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Figure 7.11. (a) DAX daily negative returns and a QQplot of their spacings as in Figure 4.3. 
(b) Data have been declustered with the runs method using a run length of 10 trading days. 
The spacings of the 42 cluster maxima are more consistent with a Poisson model. 


Example 7.40 (POT analysis of AT&T weekly losses). We close this section with 
an example of a standard POT model applied to extremes in financial return data. To 
mitigate the clustering phenomenon discussed above we use weekly return data, as 
previously analysed in Examples 7.24 and 7.25. Recall that these yield 102 weekly 
percentage losses for the AT&T stock price exceeding a threshold of 2.75%. The 
data are shown in Figure 7.12, where we observe that the inter-exceedance times 
seem to have a roughly exponential distribution, although the discrete nature of the 
times and the relatively low value of n means that there are some tied values for the 
spacings, which makes the plot look a little granular. Another noticeable feature is 
that the exceedances of the threshold appear to become more frequent over time, 
which might be taken as evidence against the homogeneous Poisson assumption for 
threshold exceedances and against the implicit assumption that the underlying data 
form a realization from a stationary time series. It would be possible to consider a 
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Figure 7.12. (a) Time series of AT&T weekly percentage losses from 1991 to 2000. (b) Cor- 
responding realization of the marked point process of exceedances of the threshold 2.75%. 
(c) QQplot of inter-exceedance times against an exponential reference distribution. See Exam- 
ple 7.40 for details. 


POT model incorporating a trend of increasingly frequent exceedances, but we will 
not go this far. 

We fit the standard two-dimensional Poisson model to the 102 exceedances of 
the threshold 2.75% using the likelihood in (7.36) and obtain parameter estimates 
Ê = 0.22, fi = 19.9 and ĉ = 5.95. The implied GPD shape parameter for the dis- 
tribution of excess losses over the threshold u is B =ô + E(u — ft) = 2.1, so we 
have exactly the same estimates of £ and £ as in Example 7.24. 

The estimated exceedance rate for the threshold u = 2.75 is given by T(u) = 
—Ind, fae (u) = 102, which is precisely the number of exceedances of that thresh- 
old, as theory suggests. It is of more interest to look at estimated exceedance rates 
for higher thresholds. For example, we get t (15) = 2.50, which implies that losses 
exceeding 15% occur as a Poisson process with rate 2.5 losses per 10-year period, 
so that such a loss is, roughly speaking, a four-year event. Thus the Poisson model 
gives us an alternative method of defining the return period of a stress event and 
a more powerful way of calculating such a risk measure. Similarly we can invert 
the problem to estimate return levels: suppose we define the 10-year return level as 
that level which is exceeded according to a Poisson process with rate one loss per 
10 years, then we can easily estimate the level in our model by calculating 


1 2 
gaa CPCOD) = 19.9, 


so the 10-year event is a weekly loss of roughly 20%. Using the profile likelihood 
method in Section A.3.5 we could also give confidence intervals for such estimates. 
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7.4.3 Self-Exciting Processes 


In this section we move away from homogeneous Poisson models for the occurrence 
times of exceedance of high thresholds and consider self-exciting point processes, or 
so-called Hawkes processes. In these models a series of recent threshold exceedances 
causes the instantaneous risk of a threshold exceedance at the present point in time 
to be higher. The main area of application of these models has traditionally been 
in the modelling of earthquakes and their aftershocks; however, their structure also 
seems appropriate for modelling market shocks and the tremors that follow these. 

Given data X1,..., Xn and a threshold u, we will assume as usual that there are 
N, exceedances, comprising the data {(i, X;): 1 <i <n, X; > u}. Note that 
from now on we will express the time of an exceedance on the natural timescale 
of the time series, so if, for example, the data are daily observations, then our 
times are expressed in days. It will also be useful to have the alternative notation 
{T}, X jij =1,..., Nu}, which enumerates exceedances consecutively. 

We first consider a model for exceedance times only. In point process notation we 
let Y; = iJ;x;>u}, So Y; returns an exceedance time, in the event that one takes place 
at time i, and returns zero otherwise. The point process of exceedances is the process 
N(-) with state space X = (0, n] given by N(A) = a lyca for A C X. 

We assume that the point process N (-) is a self-exciting process with conditional 
intensity 

MO=t+hW Yo ht-Tj,Xj—u), (7.38) 

j:0<T; <t 
where t > 0, Ww > O and h is some positive-valued function. Each previous 
exceedance (7;, X j) contributes to the conditional intensity and the amount that it 
contributes can depend on both the elapsed time (t — 7;) since that exceedance and 
the amount of the excess loss (X j — u) over the threshold. Informally, we understand 
the conditional intensity as expressing the instantaneous chance of a new exceedance 
of the threshold at time ż, like the rate or intensity of an ordinary Poisson process. 
However, in the self-exciting model, the conditional intensity is itself a stochastic 
process which depends on œw, the state of nature, through the history of threshold 
exceedances up to (but not including) time t. 
Possible parametric specifications of the h function are 


e h(s, x) = exp(6x — ys), where ô, y > 0; or 
e h(s, x) = exp(6x)(s + y)~ "tT, where ô, y, p > 0. 
Collecting all parameters in 0, the likelihood takes the form 


N, 


L(0; data) = exp ( — [ A*(s) as) Taa, 
0 i=l 


and may be maximized numerically to obtain parameter estimates. 


Example 7.41 (S&P daily percentage losses 1996-2003). We apply the self- 
exciting process methodology to all daily percentage losses incurred by the Standard 
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Figure 7.13. (a) S&P daily percentage loss data. (b) Two hundred largest losses. (c) A 
QQplot of inter-exceedance times against an exponential reference distribution. (d) The esti- 
mated intensity of exceeding the threshold in a self-exciting model. See Example 7.41 for 
details. 


& Poor’s index in the eight-year period 1996-2003 (2078 values). In Figure 7.13 the 
loss data are shown as well as the point process of the 200 largest daily losses exceed- 
ing a threshold of 1.50%. Clearly, there is clustering in the pattern of exceedance 
data and the QQplot shows that the inter-exceedance times are not exponential. 

We fit the simpler self-exciting model with A(s, x) = exp(dx — ys). The param- 
eter estimates (and standard errors) are tT = 0.032(0.011), v = 0.016(0.0069), 
y = 0.026(0.011), ô = 0.13(0.27), suggesting that all parameters except ô are 
significant. The log-likelihood for the fitted model is —648.2, whereas the log- 
likelihood for a homogeneous Poisson model is —668.2; thus the Poisson special 
case can clearly be rejected. The final picture shows the estimated intensity A*(t) 
of crossing the threshold throughout the data observation period, which seems to 
reflect the pattern of exceedances observed. 


Note that a simple refinement of this model (and those of the following section) 
would be to consider a self-exciting structure where both extreme negative and 
extreme positive returns contributed to the conditional intensity; this would involve 
setting upper and lower thresholds and considering exceedances of both. 


7.4.4 A Self-Exciting POT Model 


We now consider how the POT model of Section 7.4.2 might be generalized to 
incorporate a self-exciting component. We first develop a marked self-exciting model 
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where marks have a generalized Pareto distribution, but are unpredictable, meaning 
that the excess losses are iid GPD. In the second model we consider the case of 
predictable marks. In this model the excess losses are conditionally generalized 
Pareto, given the exceedance history up to the time of the mark, with a scaling 
parameter that depends on that history. In this way we get a model where, in a 
period of excitement, both the temporal intensity of occurrence and the magnitude 
of the exceedances increase. 

In point process language our models are processes N(-) on a state space of the 
form X = (0, n] x (u, co) such that N (A) = ae la, xpea} for sets A C X. To 
build these models we start with the intensity of the reparametrized version of the 
standard POT model given in (7.37). We recall that this model simply says that 
exceedances of the threshold u occur as a homogeneous Poisson process with rate t 
and that excesses have a generalized Pareto distribution with df Gg. 


Model with unpredictable marks. We first introduce the notation v*(t) = 
X j:0<T; <h(t — Tj, x j — u) for the self-excitement function, where the function h 
is as in Section 7.4.3. We generalize (7.37) and consider a self-exciting model with 
conditional intensity 


A” (t,x) = 


* a —1/§-1 
EO (+e *) (7.39) 


B B 


on a state space X = (0,n] x (u, oo), where t > 0 and y > 0. Effectively, we 
have combined the one-dimensional intensity in (7.38) with a GPD density. When 
y = 0 we have an ordinary POT model with no self-exciting structure. 

It is easy to calculate that the conditional rate of crossing the threshold x > u at 
time ¢, given information up to that time, is 


2% x—u\ i 
anf i days (ee Wwron(1+8 ; ) , (7.40) 


which, for fixed x, is simply a one-dimensional self-exciting process of the form 
(7.38). The implied distribution of the excess losses when an exceedance takes place 
is generalized Pareto, because 


* —1/é 
ae = (1 eh =) = Ĝ; g(x), (7.41) 


independently of t. Statistical fitting of this model is performed by maximizing a 
likelihood of the form 


n Nu 
L(@; data) = exp ( —nt — vf v*(s) as) I] A*(T}, Xj). (7.42) 
0 


j=l 


A model with predictable marks. A model with predictable marks can be obtained 
by generalizing (7.39) to get 


i _ tt yur) gag Ve 
MOG a aro) l ner 
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where 6 > 0 anda > 0. For simplicity we have assumed that the GPD scaling is 
also linear in the self-excitement function v*(t). The properties of this model follow 
immediately from the model with unpredictable marks. The conditional crossing 
rate of the threshold x > u at time t is as in (7.40) with the parameter 6 replaced by 
the time-dependent self-exciting function 6 + av*(t). By repeating the calculation 
in (7.41) we find that the distribution of the excess loss over the threshold, given 
that an exceedance takes place at time ¢ and given the history of exceedances up 
to time f, is generalized Pareto with df Ge,g+qy*(r). The likelihood for fitting the 
model is again (7.42), where the function A*(t, x) is now given by (7.43). Note that 
by comparing a model with œ = 0 and a model with a > 0 we can formally test the 
hypothesis that the marks are unpredictable using a likelihood ratio test 


Example 7.42 (self-exciting POT model for S&P daily loss data). We continue 
the analysis of the data of Example 7.41 by fitting self-exciting POT models with 
both unpredictable and predictable marks to the 200 exceedances of the threshold 
u = 1.5%. The former is equivalent to fitting a self-exciting model to the exceedance 
times as in Example 7.41 and then fitting a GPD to the excess losses over the 
threshold; thus the estimated intensity of crossing the threshold is identical to the 
one shown in Figure 7.13. The log-likelihood for this model is —783.4, whereas a 
model with predictable marks gives a value of —779.3 for one extra parameter a; 
in a likelihood ratio test the p-value is 0.004, showing a significant improvement. 

In Figure 7.14 we show the exceedance data as well as the estimated intensity 
t*(t, u) of exceeding the threshold in the model with predictable marks. We also 
show the estimated mean of the GPD for the conditional distribution of the excess 
loss above the threshold, given that an exceedance takes place at time t. The GPD 
mean (8 + av*(t))/(1 — &) and the intensity t*(t, u) are both affine functions of 
the self-excitement function v* (t) and obviously follow its path. 


Calculating conditional risk measures. Finally, we note that self-exciting POT 
models can be used to estimate a kind of analogue of a conditional VaR and also 
a conditional expected shortfall. If we have analysed n daily data ending on day t 
and want to calculate, say, a 99% VaR, then we treat the problem as a (conditional) 
return-level problem; we look for the level at which the conditional exceedance 
intensity at a time point just after t (denoted by t+) is 0.01. In general, to calculate 
a conditional estimate of VaR‘, (for œ sufficiently large) we would attempt to solve 
the equation t*(t+,x) = (1 — a) for some value of x satisfying x > u. In the 
model with predictable marks this is possible if t + wu*(t+) > 1 — @ and gives 


the formula 
t a SE 
meea e a =) 
3 T + wu*(t+) 


The associated conditional expected shortfall could then be calculated by observing 
that the conditional distribution of excess losses above VaR{, given information up to 


time t is GPD with shape parameter £ and scaling parameter given by B+ av*(t+)+ 
&(VaRi, —u). 
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Figure 7.14. (a) Exceedance pattern for 200 largest daily losses in S&P data. (b) Estimated 
intensity of exceeding the threshold in a self-exciting POT model with predictable marks. 
(c) Mean of the conditional generalized Pareto distribution of the excess loss above the 
threshold. See Example 7.42 for details. 


Notes and Comments 


For more information about point processes consult EKM, Cox and Isham (1980), 
Kallenberg (1983) and Resnick (1987). The point process approach to extremes 
dates back to Pickands (1971) and is also discussed in Leadbetter, Lindgren and 
Rootzén (1983), Leadbetter (1991) and Falk, Husler and Reiss (1994). 

The two-dimensional Poisson point process model was first used in practice by 
Smith (1989) and may also be found in Smith and Shively (1995); both these papers 
discuss the adaptation of the point process model to incorporate covariates or time 
trends in the context of environmental data. An insurance application is treated 
in Smith and Goodman (2000), which also treats the point process model from a 
Bayesian perspective. An interesting application to wind storm losses is Rootzén 
and Tajvidi (1997). A further application of the bivariate point process framework to 
model insurance loss data showing trends in both intensity and severity of occurrence 
is found in McNeil and Saladin (2000). For further applications to insurance and 
finance, see Chavez-Demoulin and Embrechts (2004). An excellent overview of 
statistical approaches to the GPD and point process models is found in Coles (2001). 

The derivation of likelihoods for point process is beyond the scope of this book 
and we have simply recorded the likelihoods to be maximized without further justifi- 
cation. See Daley and Vere-Jones (2003, Chapter 7) for more details on this subject; 
see also Coles (2001, p. 127) for a good intuitive account in the Poisson case. 

The original reference to the Hawkes self-exciting process is Hawkes (1971). 
There is a large literature on the application of such processes to earthquake mod- 
elling; a starter reference is Ogata (1988). The application to financial data was 
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suggested in Chavez-Demoulin, Davison and McNeil (2005). The idea of a POT 
model with self-exciting structure explored in Section 7.4.4 is new. 


7.5 Multivariate Maxima 


In this section we give a brief overview of the theory of multivariate maxima, stating 
the main results in terms of copulas. A class of copulas known as extreme value cop- 
ulas emerges as the class of natural limiting dependence structures for multivariate 
maxima. These provide useful dependence structures for modelling the joint tail 
behaviour of risk factors that appear to show tail dependence. The main reference 
is Galambos (1987), which is one of the few texts to treat the theory of multivariate 
maxima as a copula theory (although Galambos does not use the word, referring to 
copulas simply as dependence functions). 


7.5.1 Multivariate Extreme Value Copulas 


Let X,,...,X, be iid random vectors in R? with joint df F and marginal dfs 
F,..., Fg. We label the components of these vectors X; = (Xj.1,..., Xi,q)’ and 
interpret them as losses of d different types. We define the maximum of the jth 
component to be My, ; = max(X1,j,...,Xn,j), J = 1,...,d. In classical multi- 
variate EVT the object of interest is the vector of componentwise block maxima: 
Mn = (Mn, ---, Mn.a)’. In particular, we are interested in the possible multivariate 
limiting distributions for M, under appropriate normalizations, much as in the uni- 
variate case. It should, however, be observed that the vector M,, will in general not 
correspond to any of the vector observations X;. 
We seek limit laws for 


Mn = dn (*= = dn,1 Mn,ad = ana 


gereg 


Cn,1 Cn,d 


as n —> œ, where €, = (Cn,1,-.-,Cn,d) and dy = (dn,1,...,dn qay are vec- 
tors of normalizing constants, the former satisfying c, > 0. Note that in this and 
other statements in this section, arithmetic operations on vectors of equal length are 
understood as componentwise operations. Supposing that (M, — d,)/cn converges 
in distribution to a random vector with joint df H, we have 


now 


M,, — d 
lim p(“=* < x) = lim F"(e,x + dn) = H(x). (7.44) 
Cn n— oo 


Definition 7.43 (MEV distribution and domain of attraction). If (7.44) holds 
for some F and some H, we say that F is in the maximum domain of attraction 
of H, written F € MDA(A), and we refer to H as a multivariate extreme value 
distribution (MEV distribution). 


The convergence issue for multivariate maxima is already partly solved by the 
univariate theory. If H has non-degenerate margins, then these must be univariate 
extreme value distributions of Fréchet, Gumbel or Weibull type. Since these are con- 
tinuous, Sklar’s Theorem tells us that H must have a unique copula. The following 
theorem asserts that this copula C must have a particular kind of scaling behaviour. 
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Theorem 7.44 (EV copula). If (7.44) holds for some F and some H with GEV 
margins, then the unique copula C of H satisfies 


C(u) = C' (u), Yt >O. (7.45) 


Any copula with the property (7.45) is known as an extreme value copula (EV cop- 
ula) and can be the copula of an MEV distribution. The independence and comono- 
tonicity copulas are EV copulas and the Gumbel copula provides an example of a 
parametric EV copula family. The bivariate version in (5.11) obviously has prop- 
erty (7.45), as does the exchangeable higher-dimensional Gumbel copula based 
on (5.38) as well as the non-exchangeable versions based on (5.43)-(5.45). 

There are a number of mathematical results characterizing MEV distributions and 
EV copulas. One such result is the following. 


Theorem 7.45 (Pickands representation). The copula C is an EV copula if and 
only if it has the representation 


d 
ca) = exp 8( au = Mea ) Emu}, (1.46) 
i=l 


ae In Uk ee ye In Uk 
where B(w) = Ss max(x,;W ,...,XqWq) dH (x) and H is a finite measure on the 
d-dimensional simplex, i.e. the set Sg = {x : xi Z 0, i = 1,...,d, D xj = 1}. 


The function B(w) is sometimes referred to as the dependence function of the EV 
copula. In the general case, such functions are difficult to visualize and work with, 
but in the bivariate case they have a simple form which we discuss in more detail. 

In the bivariate case we redefine B(w) as a function of a scalar argument by 
setting A(w) := B((w, 1 — w)’) with w € [0, 1]. It follows from Theorem 7.45 that 
a bivariate copula is an EV copula if and only if it takes the form 


l 
C(u1, u2) = exp} (nui + mupa = — J}, (1.47) 
Inu, + In u2 


where A (w) = IA max((1 — x)w, x(1 — w))dH (x) for a measure H on [0, 1]. It 
can be inferred that such bivariate dependence functions must satisfy 


max(w, 1— w) <S A(w) <1, O<w<il, (7.48) 


and must moreover be convex. Conversely, a differentiable, convex function A (w) 
satisfying (7.48) can be used to construct an EV copula using (7.47). 

The upper and lower bounds in (7.48) have intuitive interpretations. If A(w) = 1 
for all w, then the copula (7.47) is clearly the independence copula, and if A(w) = 
max(w, 1— w), then it is the comonotonicity copula. It is also useful to note, and easy 
to show, that we can extract the dependence function from the EV copula in (7.47) 
by setting 

A(w) = — ln C(e, e70®), w e [0, 1]. (7.49) 


Example 7.46 (Gumbel copula). We consider the asymmetric version of the bivari- 
ate Gumbel copula defined by (5.11) and construction (5.43), i.e. the copula 


CS (ui, u2) = uy “uy f exp{—(—a In u1)? + (—8 1n u2)°)!/®), 
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Figure7.15. Plot of dependence functions for (a) the symmetric Gumbel, (b) the asymmetric 
Gumbel, (c) the symmetric Galambos and (d) the asymmetric Galambos copulas (asymmetric 
cases have œ = 0.9 and 8 = 0.8) as described in Examples 7.46 and 7.47. Dashed lines show 
boundaries of the triangle in which the dependence function must reside; solid lines show 
dependence functions for a range of parameter values. 


As already remarked, this copula has the scaling property (7.45) and is an EV copula. 
Using (7.49) we calculate that the dependence function is given by 


A(w) = (1 —a)w + (1 — A) — w) + (Caw)? + (BCL — w)’. 


We have plotted this function in Figure 7.15 for a range of 0 values running from 1.1 
to 10 in steps of size 0.1. Part (a) shows the standard symmetric Gumbel copula with 
a = ß = 1; the dependence function essentially spans the whole range from inde- 
pendence, represented by the upper edge of the dashed triangle, to comonotonicity, 
represented by the two lower edges of the dashed triangle which comprise the func- 
tion A(w) = max(w, | — w). Part (b) shows an asymmetric example with a = 0.9 
and 6 = 0.8; in this case we still have independence when 0 = 1, but the limit as 
0 — oo is no longer the comonotonicity model. The Gumbel copula model is also 
sometimes known as the logistic model. 


Example 7.47 (Galambos copula). This time we begin with the dependence func- 
tion given by 
A(w) = 1 — (aw)? + (BU — w), (7.50) 


where 0 < a, 8 < landO < 6 < œ. It can be verified that this is a convex function 
satisfying max(w, 1 — w) < A(w) < 1 for 0 < w < 1, so it can be used to create 
an EV copula in (7.47). We obtain the copula 


CHA (u1, u2) = uyuz exp{((—o In w1)~° + (—f 1n u2)™®) 7°}, 
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which has also been called the negative logistic model. We have plotted this function 
in Figure 7.15 for a range of 0 values running from 0.2 to 5 in steps of size 0.1. 
Part (c) shows the standard symmetric case with a = 6 = 1 spanning the whole 
range from independence to comonotonicity. Part (d) shows an asymmetric example 
with œ = 0.9 and 6 = 0.8; in this case we still approach independence as 6 —> 0, 
but the limit as 9 —> oo is no longer the comonotonicity model. 


A number of other bivariate EV copulas have been described in the literature (see 
Notes and Comments). 


7.5.2 Copulas for Multivariate Minima 


The structure of limiting copulas for multivariate minima can be easily inferred from 
the structure of limiting copulas for multivariate maxima; moving from maxima to 
minima essentially involves the same considerations that we made at the end of 
Section 7.1.1 and uses identity (7.2) in particular. 

Normalized componentwise minima of iid random vectors X1, ..., Xn, with df 
F will converge in distribution to a non-degenerate limit if the df F of the ran- 
dom vectors — X1, ..., —X, is in the maximum domain of attraction of an MEV 
distribution (see Definition 7.43), written Fe MDA(H). Of course, for a radially 
symmetric distribution, F coincides with F. 

Let M* be the vector of componentwise maxima of —X1,...,—Xn so that 
M% j = max(—X1,j,..., —Xn,j). If Fe MDA(H) for some non-degenerate H, 
we have 

lim P 


n—> o0 


* 
(=— < x) = lim F” (cnx + dn) = H (x) (7.51) 
Cn n—>0o 

for appropriate sequences of normalizing vectors c, and d, and an MEV distribu- 
tion H of the form H(x) = C(Ha (x1), ..., He; (xa)), where Hg, denotes a GEV 
distribution with shape parameter £; and C is an EV copula satisfying (7.45). 

Defining the vector of componentwise minima by m, and using (7.2), it follows 
from (7.51) that 

lim p(T% > x) = H(—x), 
n—> œ Cn 

so that normalized minima converge in distribution to a limit with survival function 
H(—x) = C(Hé,(—x1),..., Hg (—xa)). It follows that the copula of the limiting 
distribution of the minima is the survival copula of C (see Section 5.1.5 for discussion 
of survival copulas). In general, the limiting copulas for minima are survival copulas 
of EV copulas and concrete examples of such copulas are the Gumbel and Galambos 
survival copulas. 

In the special case of a radially symmetric underlying distribution, the limiting 
copula of the minima is precisely the survival copula of the limiting EV copula of 
the maxima. 


7.5.3 Copula Domains of Attraction 


As in the case of univariate maxima we would like to know which underlying 
multivariate dfs F are attracted to which MEV distributions H. We now give a 
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useful result in terms of copulas which is essentially due to Galambos (see Notes 
and Comments). 


Theorem 7.48. Let F(x) = C(Fı (x1), ..., Fa(xa)) for continuous marginal dfs 
F,,..., Fg and some copula C. Let H (x) = Co(Ai(%1),..., Ha(xa)) be an MEV 
distribution with EV copula Co. Then F € MDA(H) if and only if F; € MDA(H;) 
forl <i < d and 


Jim C'l, ... uy!) = Colui, ..., ua), we [0,1]. (7.52) 


This result shows that the copula Co of the limiting MEV distribution is deter- 
mined solely by the copula C of the underlying distribution according to (7.52); the 
marginal distributions of F determine the margins of the MEV limit but are irrele- 
vant to the determination of its dependence structure. This motivates us to introduce 
the concept of a copula domain of attraction. 


Definition 7.49. If (7.52) holds for some C and some EV copula Co, we say that 
C is in the copula domain of attraction of Co, written C € CDA(Co). 


There are a number of equivalent ways of writing (7.52). First, by taking log- 
arithms and using the asymptotic identity In(x) ~ x — 1 as x — 1, we get, for 
u € (0, 11f, 


lim (1 — Cu}, ..., ut) = — 1n Colu, ... , ua), 
too (7.53) 
1—CW’,...,us ; 
lim AEREE UE E T 
s—>0t S 


By inserting u; = exp(—x;i) in the latter identity and using exp(—sx) ~ 1 — sx as 
s — 0, we get, for x € [0, o0)f, 
3 1— C(1— sx1,..., 1 — Sxa) 
lim = 


s—>0+t S 


In Co(e™!,,..., e7*4), (7.54) 


Example 7.50 (limiting copula for bivariate Pareto distribution). In Exam- 
ple 5.12 we saw that the bivariate Pareto distribution has univariate Pareto margins 
Fi(x) = 1 — (ki/(ki + x))* and Clayton survival copula. It follows from Exam- 
ple 7.6 that F; € MDA (H1/a), i = 1, 2. Using (5.14) the Clayton survival copula is 
calculated to be C (u1, u2) = uy + u2 — 1 + (1 — u1)! + (A — u2)" — 1)™®. 
Using (7.54) it is easily calculated that Co(u1, u2) = u1u2 exp(((— ln u1)! + 
(—Inu2)~!/“)—®), which is the standard exchangeable Galambos copula of Exam- 
ple 7.47. Thus the limiting distribution of maxima consists of two Fréchet dfs con- 
nected by the Galambos copula. 


The coefficients of upper tail dependence play an interesting role in the copula 
domain of attraction theory. In particular, they can help us to recognize copulas that 
lie in the copula domain of attraction of the independence copula. 


Proposition 7.51. LetC bea bivariate copula with upper tail-dependence coefficient 
Ay and assume that C satisfies C € MDA(Co) for some EV copula Co. Then hy 
is also the upper tail-dependence coefficient of Co and is related to its dependence 
function by dy = 2(1 — A(3)). 
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Proof. We use (5.28) and (5.14) to see that 


. 1—C(q,q) 
= 2 lim 2 
q—>17 l-q ql l-q 


By using the asymptotic identity In x ~ x—lasx — 1 andthe CDA condition (7.53) 
we can calculate 


_ 1—Coq.q) _ InCo(q,q) _  1—C(q*,q*) 
lim =———— = lim —— = lim ——————_ 
qa 17 1— q q>17 lnq q—>1- s—>0t —sS lnq 
: . 1— Cq", q") 
= lim lim ——— 
q—>1- s—>0t — ln(q°) 
.  L—C(v, v) 
= lim ————, 
v> 17 l-v 


which shows that C and Co share the same coefficient of upper tail dependence. 
Using the formula Ay = 2—lim,_,- In Co(q, q)/ Inq and the representation (7.47) 
we easily obtain that Ay = 2(1 — A(})). 


In the case when Ay = 0 we must have A G) = 1, and the convexity of dependence 
functions dictates that A (w) is identically one, so Co must be the independence 
copula. In the higher-dimensional case this is also true: if C is a d-dimensional 
copula with all upper tail-dependence coefficients equal to zero, then the bivariate 
margins of the limiting copula Co must all be independence copulas, and, in fact, 
it can be shown that Co must therefore be the d-dimensional independence copula 
(see Notes and Comments). 

As an example consider the limiting distribution of multivariate maxima of Gaus- 
sian random vectors. Since the pairwise coefficients of tail dependence of Gaussian 
vectors are zero (see Example 5.32), the limiting distribution is a product of marginal 
Gumbel distributions. The convergence is extremely slow, but ultimately normalized 
componentwise maxima are independent in the limit. 

Now consider the multivariate ¢ distribution, which has been an important model 
throughout this book. If X,,..., X, are iid random vectors with a tg(v, p, X) 
distribution, we know from Example 7.29 that univariate maxima of the individual 
components are attracted to univariate Fréchet distributions with parameter 1/v. 
Moreover, we know from Example 5.33 that tail dependence coefficients for the 
t copula are strictly positive; the limiting EV copula cannot be the independence 
copula. 

In fact, the limiting EV copula for t-distributed random vectors can be calculated 
using (7.54), although the calculations are tedious. In the bivariate case it is found 
that the limiting copula, which we call the t-EV copula, has dependence function 


= lv _ = Iv _ 
(w/( — w)) e)a win w)/w) e), 


va- P/O +D V0d=pV/O+1 
(7.55) 

where p is the off-diagonal component of P = (X). This dependence function 

is shown in Figure 7.16 for four different values of v and p values ranging from 


Aw) = wnn 


7.5. Multivariate Maxima 317 


1.04 
0.94 
a 0.8 
= o7 
0.6- Sf S 
0.54, i . (a) J ea (b) 
0.6} as i Y 
gs cd ©] | Ea (d) 
0 02 04 06 08 10 0 02 04 06 0.8 1.0 


Ww Ww 


Figure 7.16. Plots of dependence function for the t-EV copula for (a) v = 2, 
(b) v = 4, (c) v = 10 and (d) v = 20, and with various values of p. 


—0.5 to 0.9 with increments of 0.1. As pọ — 1 the t-EV copula converges to the 
comonotonicity copula; as 9 —> —1 or as v > œ it converges to the independence 
copula. 


7.5.4 Modelling Multivariate Block Maxima 


A multivariate block maxima method analogous to the univariate method of Sec- 
tion 7.1.4 could be developed, although similar criticisms apply, namely that the 
block maxima method is not the most efficient way of making use of extreme data. 
Also, the kind of inference that this method allows may not be exactly what is desired 
in the multivariate case, as will be seen. 

Suppose we divide our underlying data into blocks as before and we denote 
the realizations of the block maxima vectors by M;,1,..., Mn,m, where m is the 
total number of blocks. The distributional model suggested by the univariate and 
multivariate maxima theory consists of GEV margins connected by an extreme value 
copula. 

In the multivariate theory there is, in a sense, a “correct” EV copula to use, which 
is the copula Co to which the copula C of the underlying distribution of the raw data 
is attracted. However, the underlying copula C is unknown and so the approach is 
generally to work with any tractable EV copula that appears appropriate for the task 
in hand. In a bivariate application, if we restrict to exchangeable copulas, then we 
have at our disposal the Gumbel, Galambos and t-EV copulas, and a number of other 
possibilities for which references in Notes and Comments should be consulted. As 
will be apparent from Figures 7.15 and 7.16, the essential functional form of all 
these families is really very similar; it makes sense to work with either Gumbel 
or Galambos as these have simple forms that permit a relatively easy calculation 
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of the copula density (which is needed for likelihood inference). Even if the “true” 
underlying copula were t, it would not really make sense to use the more complicated 
t-EV copula, since the dependence function in (7.55) for any v and p can be very 
accurately approximated by the dependence function of a Gumbel copula. 

The Gumbel copula also allows us to explore the possibility of asymmetry by using 
the general non-exchangeable family described in Example 7.46. For applications in 
dimensions higher than two, the higher-dimensional extensions of Gumbel discussed 
in Sections 5.4.2 and 5.4.3 may be useful, although we should stress again that 
multivariate extreme value models are best suited to low-dimensional applications. 

Putting these considerations together, data on multivariate maxima could be mod- 
elled using the df Hg pio.o(¥) = Co (Hs u1,0;(%1),---, Hey, naoa (Xa)) for some 
tractable parametric EV copula Cg. The usual method involves maximum likeli- 
hood inference and the maximization can either be performed in one step for all 
parameters of the margins and copula or broken into two steps, whereby marginal 
models are estimated first and then a parametric copula is fitted using the ideas in 
Sections 5.5.2 and 5.5.3. The following bivariate example gives an idea of the kind 
of inference that can be made with such a model. 


Example 7.52. Let M65, represent the quarterly maximum of daily percentage falls 
of the US dollar against the euro and let M65,2 represent the quarterly maximum of 
daily percentage falls of the US dollar against the yen. We define a stress event for 
each of these daily return series: for the dollar against the euro we might be concerned 
about a 4% fall in any one day; for the dollar against the yen we might be concerned 
about a 5% fall in any one day. We want to estimate the unconditional probability 
that one or both of these stress events occurs over any quarter. The probability p of 
interest is given by p = 1 — P(M65,ı < 4%, M65,2 < 5%) and approximated by 
1— Hg „o,o (0.04, 0.05), where the parameters are estimated from the block maxima 
data. Of course, a more worrying scenario might be that both of these stress events 
should occur on the same day. To calculate the probability of simultaneous extreme 
events we require a different methodology, which is developed in Section 7.6. 


Notes and Comments 


Early works on distributions for bivariate extremes include Geffroy (1958), 
Tiago de Oliveira (1958) and Sibuya (1960). A selection of further important papers 
in the development of the subject include Galambos (1975), de Haan and Resnick 
(1977), Balkema and Resnick (1977), Deheuvels (1980) and Pickands (1981). The 
texts by Galambos (1987) and Resnick (1987) have both been influential; our pre- 
sentation more closely resembles the former. 

Theorem 7.44 is proved in Galambos (1987) (see Theorem 5.2.1 and Lemma 5.4.1 
therein (see also Joe 1997, p. 173)). Theorem 7.45 is essentially a result of Pickands 
(1981). A complete version of the proof is given in Theorem 5.4.5 of Galambos 
(1987), although it is given in the form of a characterization of MEV distributions 
with Gumbel margins. This is easily reformulated as a characterization of the EV 
copulas. In the bivariate case necessary and sufficient conditions for A(w) in (7.47) 
to define a bivariate EV copula are given in Joe (1997, Theorem 6.4). 
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The copula of Example 7.47 appears in Galambos (1975). A good summary 
of other bivariate and multivariate extreme value copulas is found in Kotz and 
Nadarajah (2000); they are presented as MEV distributions with unit Fréchet margins 
but the EV copulas are easily inferred from this presentation. See also Joe (1997, 
Chapters 5 and 6), in which EV copulas and their higher-dimensional extensions 
are discussed. Many parametric models for extremes have been suggested by Tawn 
(1988, 1990). 

Theorem 7.48 is found in Galambos (1987), where the necessary and sufficient 
copula convergence criterion is given as limp C” (u!/") = Co(u) for positive 
integers n; by noting that for any t > O we have the inequalities 


CHH gy t/t) < Cul") < C (u!/ C+D), 


it can be inferred that this is equivalent to lim;—oo C t(u!/*) = Co(u). Further 
equivalent CDA conditions are found in Takahashi (1994). The idea of a domain of 
attraction of an EV copula also appears in Abdous, Ghoudi and Khoudraji (1999). 
Not every copula is in a copula domain of attraction; a counterexample may be 
found in Schlather and Tawn (2002). 

We have shown that pairwise asymptotic independence for the components of 
random vectors implies pairwise independence of the corresponding components 
in the limiting MEV distribution of the maxima. Pairwise independence for an 
MEV distribution in fact implies mutual independence, as recognized and described 
by a number of authors: see Galambos (1987, Corollary 5.3.1), Resnick (1987, 
Theorem 5.27), and the earlier work of Geffroy (1958) and Sibuya (1960). 


7.6 Multivariate Threshold Exceedances 


In this section we describe practically useful models for multivariate extremes 
(again in low-dimensional applications) that build on the basic idea of modelling 
excesses over high thresholds with the generalized Pareto distribution (GPD) as 
in Section 7.2. The idea is to use GPD-based tail models of the kind discussed in 
Section 7.2.3 together with appropriate copulas to obtain models for multivariate 
threshold exceedances. 


7.6.1 Threshold Models Using EV Copulas 


Assume that the vectors X1,..., Xn have unknown joint distribution F(x) = 
C(F\(x1),..-, Fa(%q)) for some unknown copula C and margins F),..., Fg, and 
that F is in the domain of attraction of an MEV distribution. Much as in the univari- 
ate case we would like to approximate the upper tail of F(x) above some vector of 
high thresholds u = (u1, ..., ug)’. The univariate theory of Sections 7.2.2 and 7.2.3 
tells us that, for x; > uj and u; high enough, the tail of the marginal distribution 
F; may be approximated by a GPD-based functional form 


~ Xj— Uj =1/5j 
Fj&j)= I-y(1+8; E7 1) (7.56) 
j 
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where Aj = F (uj). This suggests that for x > u we use the approximation 
F(x) & C(Fi (X1),---5 Fi (xq)). But C is also unknown and must itself be approx- 
imated in the tail. The following heuristic argument suggests that we should be able 
to replace C by its limiting copula Co. 

The CDA condition (7.52) suggests that for any value v € (0, 1)“ and t sufficiently 
large we may make the approximation C(v!/‘) ~ Co! ‘(v). If we now write w = 
v!/t, we have 


C(w) © Co!" (w') = Co(w), (7.57) 


by the scaling property of EV copulas. The approximation (7.57) will be best for 
large values of w, since v!/‘ > Last > oo. 

We assume then that we can substitute the copula C with its EV limit Co in the 
tail, and this gives us the overall model 


F(x) = Co(Fi(a1),..., Fu(xa)), x Du. (7.58) 


We complete the model specification by choosing a flexible and tractable parametric 
EV copula for Co. As before, the Gumbel copula family is particularly convenient. 


7.6.2 Fitting a Multivariate Tail Model 


Assume we have observations X,,..., Xn from a df F with a tail that permits the 
approximation (7.58). Of these observations, only a minority are likely to be in the 
joint tail (x > u); other observations may exceed some of the individual thresholds 
but lie below others. The usual way of making inferences about all the parameters of 
such a model (the marginal parameters £j, j, Aj, for j = 1,...,d, and the copula 
parameter (or parameter vector) @) is to maximize a likelihood for censored data. 

Let us suppose that m; components of the data vector X; exceed their respective 
thresholds in the vector u. The only relevant information that the remaining compo- 
nents convey is that they lie below their thresholds; such a component X;,; is said 
to be censored at the value uj. The contribution to the likelihood of X; is given by 

Mi 
Li = Li, B,2,0; Xi) = E Chao ; 
Aja IX jn, max(X;,u) 

where the indices jj,..., jm; are those of the components of X; exceeding their 
thresholds. 

For example, in a bivariate model with Gumbel copula (5.11) the likelihood 
contribution would be 


CH1 —A1, 1 — Ad), Xi Sui, Xi2 < ua, 

CHF (Xi), 1 — Aa) fi(Xi), Xia > U1, Xi2 Sun, 

A ca — At, PXD fo(Xi,2), Xi 1 S u1, Xi,2 >un, 

UF Xi), Fo(Xi2)) fi(Xi1) fa(Xi2), Xia >u, Xi2 >u, 
(1.59) 


where Ô j denotes the density of the univariate tail model F j in (7.56), oe" (u1, U2) 
denotes the Gumbel copula density and Ces (u1, u2) := (8/ðuj)CF (u1, u2) 
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Table 7.3. Parameter estimates and standard errors (in brackets) for a bivariate 
tail model fitted to exchange-rate return data; see Example 7.53 for details. 


$/€ $e 
u 0.75 1.00 
Ny 189 126 
À 0.094 (0.0065) 0.063 (0.0054) 
E —0.049 (0.066) 0.095 (0.11) 
B 0.33 (0.032) 0.38 (0.053) 
0 1.10 (0.030) 


denotes a conditional distribution of the copula, as in (5.15). The overall likelihood 
is a product of such contributions and is maximized with respect to all parameters 
of the marginal models and copula. 

In a simpler approach, parameters of the marginal GPD models could be estimated 
as in Section 7.2.3 and only the parameters of the copula obtained from the above 
likelihood. In fact this is also a sensible way of getting starting values before going 
on to the global maximization over all parameters. 

The model described by the likelihood (7.59) has been studied in some detail 
by Ledford and Tawn (1996) and a number of related models have been studied in 
the statistical literature on multivariate EVT (see Notes and Comments for more 
details). 


Example 7.53 (bivariate tail model for exchange-rate return data). We analyse 
daily percentage falls in the value of the US dollar against the euro and the Japanese 
yen, taking data for the eight-year period 1996-2003. We have 2008 daily returns 
and choose to set thresholds at 0.75% and 1.00%, giving 189 and 126 exceedances, 
respectively. In a full maximization of the likelihood over all parameters, we obtained 
the estimates and standard errors shown in Table 7.3. The value of the maximized 
log-likelihood is — 1064.7, compared with — 1076.4 in a model where independence 
in the tail is assumed (i.e. a Gumbel copula with 6 = 1), showing strong evidence 
against an independence assumption. 

We can now use the fitted model (7.58) to make various calculations about stress 
events. For example, an estimate of the probability that on any given day the dollar 
falls by more than 2% against both currencies is given by 


p12 := 1 — Fi (2.00) — Fy(2.00) + C9" (F; (2.00), F(2.00)) = 0.000 315, 


with F j as in (7.56), making this approximately a 13-year event (assuming 250 trad- 
ing days per year). The marginal probabilities of falls in value of this magnitude are 
pı:=1— F, (2.00) = 0.0014 and p? := 1 — F>(2.00) = 0.0061. We can use this 
information to calculate so-called spillover probabilities for the conditional occur- 
rence of stress events; for example, the probability that the dollar falls 2% against 
the yen given that it falls 2% against the euro is estimated to be pj2/p1 = 0.23. 
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7.6.3 Threshold Copulas and Their Limits 


Another, more recent, approach to multivariate extremes looks explicitly at the kind 
of copulas we get when we condition observations to lie above or below extreme 
thresholds. Just as the GPD is a natural limiting model for univariate threshold 
exceedances, so we can find classes of copula that are natural limiting models for 
the dependence structure of multivariate exceedances. 

The theory has been studied in most detail in the case of exchangeable bivariate 
copulas, and we concentrate on this case. Moreover, it proves slightly easier to 
switch our focus at this stage and first consider the lower-left tail of a probability 
distribution, before showing how the theory is adapted to the upper-right tail. 


Lower threshold copulas and their limits. Consider a random vector (X1, X2) 
with continuous margins F; and F> and an exchangeable copula C. We consider 
the distribution of (X1, X2) conditional on both being below their v-quantiles, an 
event we denote by A, = {X1 < FÉ (w), X2 < Fy (v)}, 0 < v < 1. Assuming 
C(v, v) Æ 0, the probability that Xj lies below its x;-quantile and X>2 lies below its 
x2-quantile conditional on this event is 
C (x1, x2) 
C(v, v) ’ 
Considered as a function of xı and x2 this defines a bivariate df on [0, v]*, and by 
Sklar’s Theorem we can write 


P(X, < Fy (x1), X2 < Fy (x2) | Av) = x1, x2 € [0, v]. 


C(x1, x2) 
Toy TOFO Foa), xi, x2 € [0, v], (7.60) 
for a unique copula C o and continuous marginal distribution functions 
C(x, 
F(x) = P(X) < F(a) | A) = S24, sesa o 
C(v, v) 
This unique copula may be written as 
CFG) u), Fay u2) 
C? (u, uz) = —© ee, (7.62) 


C(v, v) 
and will be referred to as the lower threshold copula of C at level v. Juri and Wüthrich 
(2002), who developed the approach we describe in this section, refer to it as a lower 
tail dependence copula (LTDC). It is of interest to attempt to evaluate limits for this 
copula as v —> 0; such a limit will be known as a limiting lower threshold copula. 

Much like the GPD in Example 7.19, limiting lower threshold copulas must pos- 
sess a stability property under the operation of calculating lower threshold copulas 
in (7.62). A copula C is a limiting lower threshold copula if, for any threshold 
0 < v < |l, it satisfies 

C? (u1, u2) = C (u1, u2). (7.63) 


Example 7.54 (Clayton copula as limiting lower threshold copula). For the 

standard bivariate Clayton copula in (5.12) we can easily calculate that Fœ) in (7.61) 

is 

(x8 +? — 1) 1/8 
(2v7? — 1)-1/6 


Fay (x) = 


, SxS, 
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and its inverse is 
FO) = uW -14u° dv)", O<u <1. 


Thus the lower threshold copula for the Clayton copula can be calculated from (7.62) 
and it may be verified that this is again the Clayton copula. In other words, the Clayton 
copula is a limiting lower threshold copula because (7.63) holds. 


Upper threshold copulas. To define upper threshold copulas we consider again a 
random vector (X1, X2) with copula C and margins F; and F2. We now condition on 
the event Ay = {X; > FÉ (v), X2 > Fy (v)} for0 < v < 1. We have the identity 


- C(x1, x2) 
P(X > Fy (xı), X2 > Fy (x2) | Av) a= X1, X2 € [v, 1]. 
C (v, v) 
Since C(x, x2)/C(v, v) defines a bivariate survival function on [v, 1]’, by (5.13) 
we can write 
C(x1, x2) 
Civ, v) 


for some survival copula C l of a copula C 1 and marginal survival functions 


= Ôl (Go (x1), Go (x2)), x1, x2 € [v, 1], (7.64) 


C(x, v) 
C(v, v)? 


The copula Cc is known as the upper threshold copula at level v and it is now 


Gwy (x) = P(X > Fi (x) | Av) = vex. (7.65) 


of interest to find limits as v > 1, which are known as limiting upper threshold 
copulas. In fact, as the following lemma shows, it suffices to study either lower or 
upper threshold copulas because results for one follow easily from results for the 
other. 


Lemma 7.55. The survival copula of the upper threshold copula of C at level v is 
the lower threshold copula of C at level 1 — v. 


Proof. We use the identity C(u, u2) = ĉa — u1, | — u2) and (7.65) to rewrite 
(7.64) as 
U a _ a -x 1—v) Ĉ(—v,1 =») 
ĈA —v,1-—v) "LÊ -—v,1-v) CA =a) 
Writing yı = 1 — x1, y2 = 1 — x2 and w = | — v we have 
Cy ¥2) _ at (cou w) C(w, y2) 
Ĉĉ(w, w) Ew Ĉw, w) l Ĉw, w) 


and comparison with (7.60) and (7.61) shows that é t w Must be the lower threshold 
copula of C at the level w = 1 — v. 


), yı, y2 € [0, w], 


It follows that the survival copulas of limiting lower threshold copulas are limiting 
upper threshold copulas. The Clayton survival copula is a limiting upper threshold 
copula. 
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Relationship between limiting threshold copulas and EV copulas. We give one 
result which shows how limiting upper threshold copulas may be calculated for 
underlying exchangeable copulas C that are in the domain of attraction of EV copulas 
with tail dependence, thus linking the study of threshold copulas to the theory of 
Section 7.5.3. 


Theorem 7.56. If C is an exchangeable copula with upper tail-dependence coeffi- 
cient àu > 0 satisfying C € CDA(Co), then C has a limiting upper threshold copula 
which is the survival copula of the df 


(x1 + x2)(1 — A(x /(x1 + 2))) 
Au A 


G(x1, x2) = (7.66) 
where A is the dependence function of Co. Also, C has a limiting lower threshold 
copula which is the copula of G. 


Example 7.57 (upper threshold copula of Galambos copula). We use this result 
to calculate the limiting upper threshold copula for the Galambos copula. We recall 
that this is an EV copula with dependence function given in (7.50) and consider the 
standard exchangeable case with a = 6 = 1. Using the methods of Section 5.2.3 it 
may easily be calculated that the coefficient of upper tail dependence of this copula 
is Ay = 27!/°. Thus the bivariate distribution G (x1, x2) in (7.66) is 


Gai, x2) = GETI +x, (2) € 0, 1, 


the copula of which is the Clayton copula. Thus the limiting upper threshold copula 
in this case is the Clayton survival copula. Moreover, the limiting lower threshold 
copula of the Galambos survival copula is the Clayton copula. 


The Clayton copula turns out to be an important attractor for a large class of 
underlying exchangeable copulas. Juri and Wüthrich (2003) have shown that all 
Archimedean copulas whose generators are regularly varying at 0 with negative 
parameter (meaning that ¢ (t) satisfies lim;_,9 @(xt)/@(t) = x7“ for all x and some 
a > 0) share the Clayton copula C d as their limiting lower threshold copula. 

It is of interest to calculate limiting lower and upper threshold copulas for the 
t copula, and this can be done using Theorem 7.56 and the expression for the 
dependence function in (7.55). However, the resulting limit is not convenient for 
practical purposes because of the complexity of this dependence function. We have 
already remarked in Section 7.5.4 that the dependence function of the t-EV copula is 
indistinguishable for all practical purposes from the dependence functions of other 
exchangeable EV copulas, such as Gumbel and Galambos. Thus Theorem 7.56 
suggests that instead of working with the true limiting upper threshold copula of the 
t copula we could instead work with the limiting upper threshold copula of, say, the 
Galambos copula, i.e. the Clayton survival copula. Similarly, we could work with 
the Clayton copula as an approximation for the true limiting lower threshold copula 
of the ¢ copula. 
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Limiting threshold copulas in practice Limiting threshold copulas in dimensions 
higher than two have not yet been extensively studied, nor have limits for non- 
exchangeable bivariate copulas or limits when we define two thresholds vı and v2 
and let these tend to zero (or one) at different rates. Thus the practical use of these 
ideas is largely in bivariate applications when thresholds are set at approximately 
similar quantiles and a symmetric dependence structure is assumed. 

Let us consider a situation where we have a bivariate distribution that appears to 
exhibit tail dependence in both the upper-right and lower-left corners. While true 
lower and upper limiting threshold copulas may exist for this unknown distribution, 
we could in practice simply adopt a tractable and flexible parametric limiting thresh- 
old copula family. It is particularly easy to use the Clayton copula and its survival 
copula as lower and upper limits, respectively. 

Suppose, for example, that we set high thresholds atu = (u1, u2)’, sothat P(X, > 
u1) © P(X2 > u2) and both probabilities are small. For the conditional distribution 
of (X1, X2) over the threshold u we could assume a model of the form 


P(X < x | X > u) ~ ĈÇ! (Gi, g (01 — u1), Ge, p (x2 — u2)), x >u, 


where cel is the Clayton survival copula and G¢,,g, denotes a GPD, as defined 
in 7.16. Inference about the model parameters (0, £1, 61, 2, 62) would be based on 
the exceedance data above the thresholds and would use the methods discussed in 
Section 5.5. 

Similarly, for a vector of low thresholds u satisfying P(X, < u1) © P(X2 < u2) 
with both these probabilities small, we could approximate the conditional distribu- 
tion of (X1, X2) below the threshold u by a model of the form 


P(X <x | X <u) ~ C5'(Ge,,g, (ui — x1), Gs, p (u2 — x2)), x <u, 


where C e is the Clayton copula and G; ;,6; denotes a GPD survival function. Infer- 
ence about the model parameters would be based on the data below the thresholds 
and would use the methods of Section 5.5. 


Note and Comments 


The GPD-based tail model (7.58) and inference for censored data using a likeli- 
hood of the form (7.59) have been studied by Ledford and Tawn (1996), although 
the derivation of the model uses somewhat different asymptotic reasoning based on 
a characterization of multivariate domains of attraction of MEV distributions with 
unit Fréchet margins found in Resnick (1987). The authors concentrate on the model 
with Gumbel (logistic) dependence structure and discuss, in particular, testing for 
asymptotic independence of extremes. Likelihood inference is non-problematic (the 
problem being essentially regular) when 6 > 0 and §; > —4, but testing for inde- 
pendence of extremes 0 = 1 is not quite so straightforward since this is a boundary 
point of the parameter space. This case is possibly more interesting in environmental 
applications than in financial ones, where we tend to expect dependence of extreme 
values. 
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A related bivariate GPD model is presented in Smith, Tawn and Coles (1997). In 
our notation they essentially consider a model of the form 


F(x1,...,Xa) = 1 + In Co(exp(F (x1) — 1),...,exp(F(xg) — 1), x > k, 


where Co is an extreme value copula. This model is also discussed in Smith (1994) 
and Ledford and Tawn (1996); it is pointed out that F does not reduce to a product 
of marginal distributions in the case when Co is the independence copula, unlike the 
model in (7.58). 

Another style of statistical model for multivariate extremes is based on the point 
process theory of multivariate extremes developed in de Haan (1985), de Haan and 
Resnick (1977) and Resnick (1987). Statistical models using this theory are found 
in Coles and Tawn (1991) and Joe, Smith and Weissman (1992); see also the texts of 
Joe (1997) and Coles (2001). New approaches to modelling multivariate extremes 
can be found in Heffernan and Tawn (2004) and Balkema and Embrechts (2004); 
the latter paper considers applications to stress testing high-dimensional portfolios 
in finance. 

Limiting threshold copulas are studied in Juri and Wiuthrich (2002, 2003). In 
the latter paper it is demonstrated that the Clayton copula is an attractor for the 
threshold copulas of a wide class of Archimedean copulas; moreover a version 
of our Theorem 7.57 is proved. Limiting threshold copulas for the £ copula are 
investigated in Demarta and McNeil (2005). The usefulness of Clayton’s copula and 
survival copula for describing the dependence in the tails of bivariate financial return 
data was confirmed in a large-scale empirical study of high-frequency exchange-rate 
returns by Breymann, Dias and Embrechts (2003). 


8 


Credit Risk Management 


Credit risk is the risk that the value of a portfolio changes due to unexpected changes 
in the credit quality of issuers or trading partners. This subsumes both losses due to 
defaults and losses caused by changes in credit quality, such as the downgrading of 
a counterparty in an internal or external rating system. Credit risk is omnipresent 
in the portfolio of a typical financial institution. To begin with, the lending and cor- 
porate bond portfolios are obviously affected by credit risk. Perhaps less obviously, 
credit risk accompanies any OTC (over-the-counter, i.e. non-exchange-guaranteed) 
derivative transaction such as a swap, because the default of one of the parties 
involved may substantially affect the actual pay-off of the transaction. Moreover, 
in recent years a specialized market for credit derivatives has emerged in which 
financial institutions are active players (see Section 9.1 for details). 

This brief list should convince the reader that credit risk is a highly relevant risk 
category indeed, as it relates to the core activities of most banks. Credit risk is also 
at the heart of many recent developments on the regulatory side, such as the new 
Basel II Capital Accord discussed in Chapter 1. We devote two chapters to this 
important risk category. In the present chapter we focus on static models and credit 
risk management; dynamic models and credit derivatives are discussed in Chapter 9. 


8.1 Introduction to Credit Risk Modelling 


In this section we provide a brief overview of the various model types that are used 
in credit risk before discussing some of the main challenges that are encountered in 
credit risk management. 


8.1.1 Credit Risk Models 


The development of the market for credit derivatives and the Basel II process has 
generated a lot of interest in quantitative credit risk models in industry, academia and 
among regulators, so that credit risk modelling is at present a very active subfield of 
quantitative finance and risk management. In this context it is interesting that parts 
of the new minimum capital requirements for credit risk are closely linked to the 
structure of existing credit portfolio models, as will be explained in more detail in 
Section 8.4.5. 

There are two main areas of application for quantitative credit risk models: credit 
risk management and the analysis of credit-risky securities. Credit risk management 
models are used to determine the loss distribution of a loan or bond portfolio over 
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a fixed time period (typically at least one year), and to compute loss-distribution- 
based risk measures or to make risk-capital allocations of the kind discussed in 
Section 6.3. Hence these models are typically static, meaning that the focus is on the 
loss distribution for the fixed time period rather than a stochastic process describing 
the evolution of risk in time. 

For the analysis of credit-risky securities, on the other hand, dynamic models 
(generally in continuous time) are needed, because the pay-off of most products 
depends on the exact timing of default. Moreover, in building a pricing model one 
often works directly under an equivalent martingale or risk-neutral probability mea- 
sure (as opposed to the real-world probability measure). Issues related to dynamic 
credit risk models and risk-neutral and real-world measures will be studied in detail 
in Chapter 9. 

Depending on their formulation, credit risk models can be divided into structural 
or firm-value models on the one hand and reduced-form models on the other; this 
division cuts across that of dynamic and static models. The progenitor of all firm- 
value models is the model of Merton (1974), which postulates a mechanism for the 
default of a firm in terms of the relationship between its assets and the liabilities 
that it faces at the end of a given time period. More generally, in firm-value models 
default occurs whenever a stochastic variable (or in dynamic models a stochastic 
process) generally representing an asset value falls below a threshold representing 
liabilities. For this reason static structural models are referred to in this book as 
threshold models, particularly when applied at portfolio level. The general structural 
model approach is discussed in Section 8.2 (where the emphasis is on modelling the 
default of a single firm). In Section 8.3 we look at threshold models for portfolios; 
in particular we show that copulas play an important role in understanding the 
multivariate nature of these models. 

In reduced-form models the precise mechanism leading to default is left unspec- 
ified. The default time of a firm is modelled as a non-negative rv, whose distribution 
typically depends on economic covariables. The mixture models that we treat in 
Section 8.4 can be thought of as static portfolio versions of reduced-form models. 
More specifically, a mixture model assumes conditional independence of defaults 
given common underlying stochastic factors. 

It is important to realize that mixture models are not a new class of models; 
on the contrary, the most useful static threshold models all have mixture model 
representations, as will be shown in Section 8.4.4. In continuous time a similar 
mapping between firm-value and reduced-form models is also possible if one makes 
the realistic assumption that assets and/or liabilities are not perfectly observable (see 
Notes and Comments). 

From a practical point of view, mixture models represent perhaps the most use- 
ful way of analysing and comparing one-period portfolio credit risk models. For 
these models, Monte Carlo techniques from the area of importance sampling can 
be used to approximate risk measures for the portfolio loss distribution, and to cal- 
culate associated capital allocations, as will be shown in Section 8.5. Moreover, it 
is possible to devise efficient methods of statistical inference for portfolio models 
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using historical default data. These models exploit the connection between mixture 
models and the well-known class of generalized linear mixed models in statistics; 
this is the topic of Section 8.6. 


8.1.2 The Nature of the Challenge 


Credit risk management poses certain specific challenges for quantitative modelling, 
which are less relevant in the context of market risk. 


Lack of public information and data. Publicly available information regarding the 
credit quality of corporations is typically scarce. This creates problems for corporate 
lending, as the management of a firm is usually better informed about the true 
economic prospects of the firm and hence about default risk than are prospective 
lenders. The implications of this informational asymmetry are widely discussed 
in the microeconomics literature (see Notes and Comments). The lack of publicly 
available credit data is also a substantial obstacle to the use of statistical methods 
in credit risk, a problem that is compounded by the fact that in credit risk the risk- 
management horizon is usually at least one year. It is fair to say that data problems 
are the main obstacle to the reliable calibration of credit models. 


Skewed loss distributions. Typical credit loss distributions are strongly skewed 
with a relatively heavy upper tail. Over the years a typical credit portfolio will 
produce frequent small profits accompanied by occasional large losses. A fairly 
large amount of risk capital is therefore required to sustain such a portfolio: the 
economic capital required for a loan portfolio (the risk capital deemed necessary 
by shareholders and the board of directors of a financial institution, independent 
of the regulatory environment) is often equated to the 99.97% quantile of the loss 
distribution (see Section 1.4.3). 


The role of dependence modelling. A major cause for concern in managing the 
credit risk in a given loan or bond portfolio is the occurrence in a particular time 
period of disproportionately many defaults of different counterparties. This risk is 
directly linked to the dependence structure of the default events. In fact, default 
dependence has a crucial impact on the upper tail of a credit loss distribution for a 
large portfolio. This is illustrated in Figure 8.1, where we compare the loss distri- 
bution for a portfolio of 1000 firms that default independently (portfolio 1) with a 
more realistic portfolio of the same size where defaults are dependent (portfolio 2). 
In portfolio 2 defaults are weakly dependent, in the sense that the correlation between 
default events (see Section 8.3.1) is approximately 0.5%. In both cases the default 
probability is approximately 1% so that on average we expect 10 defaults. As will 
be seen in Section 8.6, portfolio 2 can be viewed as a realistic model for the loss 
distribution generated by a homogeneous portfolio of 1000 loans with a Standard 
& Poor’s rating of BB. We clearly see from Figure 8.1 that the loss distribution of 
portfolio 2 is skewed and that its right tail is substantially heavier than the right 
tail of the loss distribution of portfolio 1, illustrating the drastic impact of default 
dependence on credit loss distributions. Typically, more dependence is reflected in 
the loss distribution by a shift of the mode to the left and a longer right tail. For this 
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Figure 8.1. Comparison of the loss distribution of two homogeneous portfolios of 
1000 loans with a default probability of 1% and different dependence structure. In port- 
folio 1 defaults are assumed to be independent; in portfolio 2 we assume a default correlation 
of 0.5%. Portfolio 2 can be considered as representative for BB-rated loans. We clearly see 
that the default dependence generates a loss distribution with a heavier right tail. 


reason we devote a large part of our exposition to the analysis of credit portfolio 
models and dependent defaults. 

There are sound economic reasons for expecting default dependence. To begin 
with, the financial health of a firm varies with randomly fluctuating macroeconomic 
factors, such as changes in economic growth. Since different firms are affected 
by common macroeconomic factors, we have dependence between their defaults. 
Moreover, default dependence is caused by direct economic links between firms, 
such as a strong borrower—lender relationship. Given the enormous size of typical 
loan portfolios it can be argued that, in credit risk management, direct business 
relations play a less prominent role in explaining default dependence. Dependence 
due to common factors, on the other hand, is of crucial importance and will be 
a recurring theme in our analysis. In the pricing of portfolio credit derivatives, the 
portfolios of interest are smaller, so modelling direct business relationships becomes 
more relevant (see Section 9.8 for models of this kind). 


Notes and Comments 


Chapter 2 of Duffie and Singleton (2003) contains a good discussion of the economic 
principles of credit risk management, elaborating on some of the issues discussed 
above. For a microeconomic analysis of the functioning of credit markets in the 
presence of informational asymmetries between borrowers and lenders we refer to 
the seminal paper by Stiglitz and Weiss (1981). 

Duffie and Lando (2001) established a relationship between firm-value models 
and reduced-form models in continuous time. Essentially, they showed that, from 
the perspective of investors with incomplete accounting information (i.e. incomplete 
information about assets or liabilities of a firm), a firm-value model becomes a 
reduced-form model. A less technical discussion of these issues can be found in 
Jarrow and Protter (2004). 
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The available empirical evidence for the existence of macroeconomic common 
factors is surveyed in Section 3.1 of Duffie and Singleton (2003). Without going into 
details, it seems that a substantial amount of the variation over time in empirical 
default rates (the proportion of firms with a given credit rating that actually defaulted 
in a given year) can be explained by fluctuations in GDP growth rates, with empirical 
default rates going up in recessions and down in periods of economic recovery. 


8.2 Structural Models of Default 


A model of default is known as a structural or firm-value model when it attempts to 
explain the mechanism by which default takes place. Because the kind of thinking 
embodied in these models has been so influential in the development of the study of 
credit risk and the emergence of industry solutions (like the KMV model discussed 
in Section 8.2.3), we consider this to be the best starting point for a treatment of 
credit risk models. 

From now on we denote a generic stochastic process in continuous time by (X;); 
the value of the process at time t > 0 is given by the rv X;. 


8.2.1 The Merton Model 


The model proposed in Merton (1974) is the prototype of all firm-value models. 
Many extensions of this model have been developed over the years, but Merton’s 
original model remains an influential benchmark and is still popular with practition- 
ers in credit risk analysis. 

Consider a firm whose asset value follows some stochastic process (V;). The firm 
finances itself by equity (i.e. by issuing shares) and by debt. In Merton’s model 
debt has a very simple structure: it consists of one single debt obligation or zero- 
coupon bond with face value B and maturity T. The value at time t of equity and 
debt is denoted by S, and B, and, if we assume that markets are frictionless (no 
taxes or transaction costs), the value of the firm’s assets is simply the sum of these, 
Le. V; = S;+B;, O < t < T. Inthe Merton model it is assumed that the firm cannot 
pay out dividends or issue new debt. Default occurs if the firm misses a payment to 
its debt holders, which in the Merton model can occur only at the maturity T of the 
bond. At maturity we have to distinguish between two cases. 


(i) Vr > B: the value of the firm’s assets exceeds the liabilities. In that case 
the debtholders receive B, the shareholders receive the residual value Sy = 
Vr — B, and there is no default. 


(ii) Vr < B: the value of the firm’s assets is less than its liabilities and the firm 
cannot meet its financial obligations. In that case shareholders have no interest 
in providing new equity capital, which would go immediately to the bond- 
holders. Instead they “exercise their limited-liability option” and hand over 
control of the firm to the bondholders, who liquidate the firm and distribute 
the proceeds among themselves. Shareholders pay and receive nothing, so 
that we have Br = Vr, Sr = 0. 
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Summarizing, we have the relations 


Sr =max(Vr — B,0) = (Vr — B)*, (8.1) 
Br =min(Vr, B) = B — (B — Vr)”. (8.2) 


Equation (8.1) implies that the value of the firm’s equity at time T equals the pay- 
off of a European call option on Vr, while (8.2) implies that the value of the firm’s 
debt at maturity equals the nominal value of the liabilities minus the pay-off of a 
European put option on Vr with exercise price equal to B. 

The above model is of course a stylized description of default. In reality the 
structure of a company’s debt is much more complex, so that default can occur on 
many different dates. Moreover, under modern bankruptcy code, default does not 
automatically imply bankruptcy, i.e. liquidation of a firm. Nonetheless, Merton’s 
model is a useful starting point for modelling credit risk and for pricing securities 
subject to default. 


Remark 8.1. The option interpretation of equity and debt is useful in explaining 
potential conflicts of interest between shareholders and debtholders of a company. It 
is well known that the value of an option increases if the volatility of the underlying 
security is increased, provided of course that the mean is not adversely affected. 
Hence shareholders have an interest in the firm taking on very risky projects. Bond- 
holders, on the other hand, have a short position in a put option on the firm’s assets 
and would therefore like to see the volatility of the asset value reduced. 


In the Merton model it is assumed that under the real-world or physical probability 
measure P the process (V;) follows a diffusion model (known as Black-Scholes 
model or geometric Brownian motion) of the form 


dV, = uy V; dt + oy V, dW; (8.3) 


for constants uy € R,oy > 0, and a standard Brownian motion (W;). Equation (8.3) 
implies that Vr = Vo exp((uy — 402)T + oy Wr), and, in particular, that In Vr ~ 
N (In Vo + (uy — 402)T, oT). Under the dynamics (8.3) the default probability 
of our firm is readily computed. We have 


ee E 
P(Vr < B) = P(In Vr < In B) = o( 72M (Hy ev"). (8.4) 


ovV/T 


It is immediately seen from (8.4) that the default probability is increasing in B, 
decreasing in Vo and uy and, for Vo > B, increasing in oy, which is all perfectly 
in line with economic intuition. 


8.2.2 Pricing in Merton’s Model 


In the context of Merton’s model we can price securities whose pay-off depends 
on the value Vr of the firm’s assets at T. Prime examples are the firm’s debt (or, 
equivalently, zero-coupon bonds issued by the firm) and the firm’s equity. We briefly 
explain the main results, since we need them in our treatment of the KMV model 
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in Section 8.2.3. The derivation of pricing formulas uses basic results from finan- 
cial mathematics. Readers not familiar with these results should simply accept the 
valuation formulas we present in the remainder of this section as facts and proceed 
quickly to Section 8.2.3; references to useful texts in financial mathematics are given 
in Notes and Comments. 

We make the following assumptions. 


Assumption 8.2. 


(i) We have frictionless markets with continuous trading. 
(ii) The risk-free interest rate is deterministic and equal tor > 0. 


(iii) The firm’s asset-value process (V;) is independent of the way the firm is 
financed, and in particular it is independent of the debt level B. Moreover, 
(V,) is a traded security with dynamics given in (8.3). 


Assumption (iii) merits some comment. First, the independence of (V;) from the 
financial structure of the firm is questionable, because a very high debt level and 
hence a high default probability may adversely affect the capability of a firm to 
generate business and hence affect the value of its assets. This is a special case of 
the indirect bankruptcy costs discussed in Section 1.4.2. Second, while there are 
many firms with traded equity, the value of the assets of a firm is usually neither 
completely observable nor traded. We come back to this issue in Section 8.2.3 below. 


General pricing results. Consider a claim on the value of the firm with maturity T 
and pay-off h(V7), such as the firm’s equity and debt in (8.1) and (8.2), and suppose 
that Assumption 8.2 holds. Standard derivative pricing theory offers two ways for 
computing the fair value f(t, V;) of this claim at time t < T. Under the partial 
differential equation (PDE) approach the function f(t, v) is computed by solving 
the PDE (subscripts denote partial derivatives) 


Ft, v) + yop" folt, V) +rvfolt, v) =rf(t,v) fort e [0, 7), (8.5) 


with terminal condition f(T, v) = h(v) reflecting the exact form of the claim to be 
priced. Equation (8.5) is the famous Black-Scholes PDE for terminal-value claims. 

Alternatively, the value f(t, V;) can be computed as the expectation of the dis- 
counted pay-off under the risk-neutral measure Q (the so-called risk-neutral pricing 
approach). Under Q the process (V;) satisfies the stochastic differential equation 
(SDE) dV; = rV; dt + oy V; dW, for a standard Q-Brownian motion W; in par- 
ticular, the drift uy in (8.3) has been replaced by the risk-free interest rate r. The 
risk-neutral pricing rule now states that 


f(t, Vi) = E2(e TT Mar) | Fi), (8.6) 


where EL denotes expectation with respect to Q. For details we refer to the text- 
books on financial mathematics listed in Notes and Comments; the relationship 
between physical probability measure P and risk-neutral measure Q and the eco- 
nomic foundations of the risk-neutral pricing rule will be discussed in more detail 
in Section 9.3. 
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Application to equity and debt. According to (8.1), the firm’s equity corresponds 
to a European call on (V;) with exercise price B and maturity T. The solution of the 
PDE (8.5), or the risk-neutral value of equity obtained from (8.6), is simply given 
by the Black-Scholes price CPS of a European call. This yields 


S, = CB (t, Vj; r, oy, B, T) := V;6(d,1) — Be"? SG, 2), 


In V, —InB+(r+507)(T — t) (8.7) 


di = and di2 = di1 —oyvT — t. 


ovvT —t 


Note that under the risk-neutral measure Q the distribution of the logarithmic asset 
value at maturity is given by In Vr ~ N (ln Vo + (r — 5a7)T, apt): Hence we get 
at time t = 0 


In Vr — (In Vo + (r — 507,)T) 
oyVT 


where we have used the fact that ®(d) = 1 — d(—d). Hence 1 — ®(do,2) gives the 
risk-neutral default probability. Similarly, 1 — ®(d;,2) gives the risk-neutral default 
probability given information up to time t. 

Next we turn to the valuation of the risky debt issued by the firm. Note that by 
Assumption 8.2(ii) the price at t < T of a default-free zero coupon bond with 
maturity T equals po(t, T) = exp(—r(T — t)). According to (8.2) we have 


Q(Vr < B) = o( < d2) = 1 — (do), 


B; = Bpo(t, T) — P?S(t, Vi; r, oy, B, T), (8.8) 


where PBS (t, V; r, oy, B, T) denotes the Black-Scholes price of a European put 
with strike B, maturity T on (V;) for given interest rate r, and volatility oy. It is 
well known that 


PPS(t, Vis r, ov, B, T) = Be" @(—d,.2) — Vi®(—di,1), (8.9) 
with d;,ı and d; 2 as in (8.7). Combining (8.8) and (8.9) we get 


B, = polt, T)B® (di2) + ViP (—d,,1). (8.10) 


Credit spread. We may use (8.10) to infer the credit spread c(t, T) implied by 
Merton’s model. The credit spread measures the difference of the continuously 
compounded yield to maturity of a default-free zero coupon bond po(t, T) and of a 
defaultable zero coupon bond pı (t, T) and is defined by 


=l -1 pit, T) 


e(t, T) = F— (n prt, T) — In po(T, D) = -— In T (8.11) 


In Merton’s model we obviously have pı(t, T) = (1/B)B, and hence 


zj 
c(t, T) = 7 — In (2a) + TE] (8.12) 


Bpo(t, T) 
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Figure 8.2. Credit spread c(t, T) in per cent as a function of (a) the firm’s volatility oy and 
(b) the time to maturity t = T — t for fixed leverage measure d = 0.6 (in (a) t = 2 years; in 
(b) oy = 0.25). Note that, for a time to maturity smaller than approximately three months, 
the credit spread implied by Merton’s model is basically equal to zero. This is not in line with 
most empirical studies of corporate bond spreads and has given rise to a number of extensions 
of Merton’s model, which are listed in Notes and Comments. We will see in Section 9.4.4 
that reduced-form models lead to a more reasonable behaviour of short-term credit spreads. 


Since d; ,ı can be rewritten as 


q = Li Bolt, T)/ Vd) + 309 T = 1) 
t= Grd FSA ’ 


and similarly for d; 2, we conclude that, for a fixed time to maturity T — t, the spread 
c(t, T) depends only on the volatility oy and on the ratio d := Bpo(t, T)/V;, which 
is the ratio of the present value of the firm’s debt to the value of the firm’s assets and 
hence a measure of the relative debt level or leverage of the firm. As the price of a 
European put is increasing in the volatility, it is immediate from (8.8) that c(t, T) 
is increasing in oy. In Figure 8.2 we plot the credit-spread as a function of oy and 
of the time to maturity t = T — t. 


Extensions. Merton’s model is quite simplistic. Over the years this has given rise 
to arich literature on firm-value models. We briefly comment on the most important 
research directions (bibliographic references are given in Notes and Comments). 
To begin with, the observation that, in reality, firms can default at essentially any 
time (and not only at a deterministic point in time T) has led to the development of 
so-called first-passage-time models. In this class of model default occurs when the 
asset-value process crosses for the first time a default threshold B, which is usually 
interpreted as the average value of the liabilities. Formally, the default time t is 
defined by t = inf{t > 0: V, < B}. Further technical developments include 
models with stochastic default-free interest rates and models where the asset-value 
process (V;) is given by a diffusion with jumps. 

Firm-value models with endogenous default threshold are an interesting eco- 
nomic extension of Merton’s model. Here the default boundary B is determined 
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endogenously by the strategic considerations of the shareholders and not fixed a 
priori by the modeller. Finally, structural models with incomplete information on 
asset value and/or liabilities provide an important link between the structural and 
the reduced-form approach to credit risk modelling. 


8.2.3. The KMV Model 


An important example of an industry model that descends from the Merton model 
is the KMV model, which was developed by KMV (a private company named after 
its founders Kealhofer, McQuown and Vasicek) in the 1990s and which is now 
maintained by Moody’s KMV. The KMV model is widely used in industry: Berndt 
et al. (2004) report that 40 out of the world’s largest 50 financial institutions are 
subscribers to the model. The major contribution of KMV is not the theoretical 
development of the model, which is a relatively straightforward extension of the 
Merton model, but its empirical testing and implementation using a huge proprietary 
database. Our presentation of the KMV model follows Crosbie and Bohn (2002) and 
Crouhy, Galai and Mark (2000). We have omitted certain details of the model, since 
detailed information about actual implementation and calibration procedures is hard 
to obtain; indeed, such procedures are likely to change as the model is developed 
further. 


Overview. The key quantity of interest in the KMV model is the so-called expected 
default frequency (EDF); this is simply the probability (under the physical proba- 
bility measure P) that a given firm will default within one year as estimated using 
the KMV methodology. Recall that in the classic Merton model the default prob- 
ability of a given firm is given by the probability that the asset value in one year, 
Vı say, lies below the threshold B representing the overall liabilities of the firm. 
Under Assumption 8.2, the EDF is a function of the current asset value Vo, the asset 
value’s annualized mean uy and volatility oy and the threshold B; using (8.4) and 
recalling that & (d) = 1 — }(—d), with T = 1, we get 


In Vo — In B + (uy — 507) 
Ov ` 


(8.13) 


EDFmerton =1 o( 


In the KMV model the EDF has a similar structure; however, 1 — ® is replaced 
by some decreasing function which is estimated empirically, B is replaced by a 
new default threshold B representing the structure of the firm’s liabilities more 
closely, and the argument of the normal df in (8.13) is replaced by a slightly simpler 
expression. Moreover, KMV does not assume that the asset value Vo of the firm is 
directly observable. Rather, it uses an iterative technique to infer Vo from the value 
of the firm’s equity. 


Determination of the asset value. Firm-value-based credit risk models usually take 
the market value of the firm’s assets as a primitive. The market value reflects investor 
expectations about the business prospects of the firm and is hence a good measure 
of the value of its ongoing business. Unfortunately, the market value of a firm is 
typically not fully observable for a number of reasons. To begin with, market value 
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can differ widely from the value of a company as measured by accountancy rules 
(the book value) (see, for instance, Section 3.1.1 of Crouhy, Galai and Mark (2000) 
for an example). Moreover, while the market value of the firm’s assets is simply 
the sum of the market values of the firm’s equity and debt, only the equity and 
parts of the debt, such as bonds issued by the firm, are actively traded, so that we 
do not know the market value of the entire debt. For these reasons KMV relies on 
an indirect approach and infers the asset value Vo from the more easily observed 
value So of a firm’s equity. 

We explain the approach in the context of the Merton model. Recall that under 
Assumption 8.2 we have that 


S; = CPG Vi; r, ovy, B, T). (8.14) 


Obviously, at a fixed point in time, t = 0 say, (8.14) is an equation with two 
unknowns, Vo and oy. To overcome this difficulty, KMV uses an iterative pro- 
cedure. In step (1), (8.14) with some initial estimate a is used to infer a time 
series of asset values (v5 from equity values. Then a new volatility estimate of? 
is constructed from this time series; a new time series (Ve) is then constructed 
using (8.14) with oe This procedure is iterated several times (see Crosbie and 
Bohn (2002) for details). 

In the version of the model that is actually implemented, the capital structure 
of the firm is modelled in a more sophisticated manner than in Merton’s model. 
The equity value is thus no longer given by (8.14), but by some different function 
f(t, Vis r, ov, d, T, c), which has to be computed numerically. Here d represents 
the leverage ratio of the firm and c is the average coupon paid by the long-term debt. 
The philosophy of the approach is, however, exactly as described above. 


Calculation of EDFs. In the Merton model default, and hence bankruptcy, occurs 
if the value of a firm’s assets falls below the value of its liabilities. With lognormally 
distributed asset values, as implied for instance by Assumption 8.2, this leads to 
default probabilities of the form (8.13). This relationship between asset value and 
default probability may be too simplistic to be an accurate description of actual 
default probabilities for a number of reasons: asset values are not necessary log- 
normal but might follow a distribution with heavy tails; our assumptions about 
the capital structure of the firm are too simplistic; there might be payments due 
at an intermediate point in time causing default at that date; finally, under modern 
bankruptcy code, default need not automatically lead to bankruptcy, i.e. to liquida- 
tion of the firm. 

To account for these factors, KMV introduces as an intermediary step a state 
variable, the so-called distance to default (DD), given by 


DD := (Vo — B)/(ov Vo), (8.15) 


where B represents the default threshold (often the liabilities payable within one 
year). Sometimes practitioners call the distance to default the “number of standard 
deviations a company is away from its default threshold B”. Note that (8.15) is in 
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Table 8.1. A summary of the KMV approach. The example is taken from Crosbie and Bohn 
(2002); it is concerned with the situation of Philip Morris Inc. at the end of April 2001. 
Financial quantities are in millions of US dollars. 


Variable Value Notes 


Market value of equity So 110688 Share price x shares outstanding 
Overall book liabilities B 64062 Determined from balance sheet 


Market value of assets Vj 170558 
Pe ee | Determined from option-pricing model 


Asset volatility oy 0.21 
Default threshold B 47499 Liabilities payable within one year 
DD 3.5 Given by the ratio (170 — 47)/(0.21 x 170), 


using relation (8.15) 


EDF (one year) 0.25% Determined using empirical mapping between 
distance to default and default frequency 


fact an approximation of the argument of (8.13), since wy and oy are small and 
since In Vo — In Be (Vo — B)/Vo. 

Inthe KMV model itis assumed that firms with equal DD have equal default proba- 
bilities. The functional relationship between DD and EDF is determined empirically. 
Using a database of historical default events, KMV estimates for every horizon the 
proportion of firms with DD in a given small range that defaulted within the given 
horizon. This proportion is the empirically estimated EDF. As one would expect, the 
empirically estimated EDF is a decreasing function; its precise form is proprietary 
to Moody’s KMV. 

In Table 8.1, we illustrate the computation of the EDF using the KMV approach 
for Philip Morris Inc. 


8.2.4 Models Based on Credit Migration 


In this section we present models where the default probability of a given firm is 
determined from an analysis of credit migration. The standard industry model in 
this class is CreditMetrics, developed by JPMorgan and the RiskMetrics Group (see, 
for instance, RiskMetrics Group 1997). We first describe the basic idea of a credit- 
migration model and the kind of data that is used to calibrate such a model, before 
showing how a migration model can be embedded in a firm-value model and thus 
treated as a structural model. 


Credit ratings and migration. In the credit-migration approach each firm is 
assigned to a credit-rating category at any given time point. There are a finite number 
of such ratings and they are ordered by credit quality and include the category of 
default. The probability of moving from one credit rating to another credit rating 
over the given risk horizon (typically one year) is then specified. 

Credit ratings for major companies or sovereigns and rating-transition matrices 
are provided by rating agencies such as Moody’s or Standard & Poor’s (S&P); 
alternatively, proprietary rating systems internal to a financial institution can be 
used. In the S&P rating system there are seven rating categories (AAA, AA, A, 
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Table 8.2. Probabilities of migrating from one rating quality to another within one year. 
Source: Standard & Poor’s CreditWeek (15 April 1996). 


Rating at year-end (%) 
hitid —§_ Ao... ao 
rating AAA AA A BBB BB B CCC Default 


BBB, BB, B, CCC) with AAA being the highest and CCC the lowest rating of 
companies which have not defaulted; there is also a default state. Moody’s uses 
seven pre-default rating categories labelled Aaa, Aa, A, Baa, Ba, B, C; a finer alpha- 
numeric system is also in use. Transition probabilities are typically presented in the 
form of a rating-transition probability matrix; an example from Standard & Poor’s 
is presented in Table 8.2. These transition matrices are determined from historical 
default data. Approaches for estimating rating-transition matrices are listed in Notes 
and Comments. 

In the credit-migration approach one assumes that the current credit rating com- 
pletely determines the default probability, so that this probability can be read off 
from the transition matrix. For instance, if we use the transition matrix presented 
in Table 8.2, we obtain that the one-year default probability of a company whose 
current S&P credit rating is A is 0.06%, whereas the default probability of a CCC- 
rated company is almost 20%. Rating agencies also produce cumulative default 
probabilities over larger time horizons. In Table 8.3 we present estimates (due to 
Standard & Poor’s) for cumulative default probabilities of companies with a given 
current credit rating. For instance, according to this table the probability that a 
company whose current credit rating is BBB defaults within the next four years 
is 1.27%. These cumulative default probabilities have been estimated directly. Alter- 
natively, we could have used the one-year transition matrix presented in Table 8.2 
to estimate these numbers. If we assume that the credit-migration process follows a 
time-homogeneous Markov chain, the n-year transition matrix is simply the n-fold 
product of the one-year transition matrix, and the n-year default probabilities can be 
read off from the last column of the n-year transition matrix. Of course, under the 
Markov assumption, both approaches should produce roughly similar results. In the 
BBB-case above, the four-year default probability under the Markov assumption 
becomes 1.41%, whereas the cumulative default probability for a BBB company 
according to Table 8.3 is 1.27%, which is relatively close. Nonetheless, the hypoth- 
esis that rating transitions occur in a Markovian way has been criticized heavily on 
empirical grounds (see Notes and Comments). 
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Table 8.3. Average cumulative default rates (%). 
Source: Standard & Poor’s CreditWeek (15 April 1996). 


Term 
hitil ~a 
rating 1 2 3 4 5 7 10 15 


AAA 0.00 0.00 0.07 0.15 0.24 0.66 1.40 1.40 
AA 0.00 0.02 0.12 0.25 0.43 0.89 1.29 1.48 
A 0.06 0.16 0.27 0.44 0.67 1.12 2.17 3.00 
BBB 0.18 0.44 0.72 1.27 1.78 2.99 4.34 4.70 
BB 1.06 3.48 6.12 8.68 10.97 1446 17.73 19.91 
B 5.20 11.00 15.95 19.40 21.88 25.14 29.02 30.65 
CCC 19.79 26.92 31.63 35.97 40.15 42.64 45.10 45.10 


Remark 8.3 (accounting for business cycles). As discussed in Section 8.1, empir- 
ical default rates tend to vary with the state of the economy, being high during 
recessions and low during periods of economic expansion. Transition rates as esti- 
mated by Standard & Poor’s or Moody’s on the other hand are historical averages 
over longer time horizons covering several business cycles. Moreover, rating agen- 
cies focus on the average credit quality “through the business cycle” when attribut- 
ing a credit rating to a particular firm. Hence the default probabilities from the 
credit-migration approach are estimates for the average default probability, inde- 
pendent of the current economic environment. In situations where we are interested 
in “point-in-time” estimates of default probabilities reflecting the current macro- 
economic environment, such as in the pricing of a short-term loan, adjustments to 
the long-term average default probabilities from the credit-migration approach have 
to be made. For instance, we could use equity prices as an additional source of 
information, as is done in the KMV approach. 


The KMV model and credit-migration approaches compared. The KMV approach 
has the following advantages. 


e Rating agencies are typically slow in adjusting their credit ratings, so that the 
current rating does not always reflect the economic condition of a firm. This 
is particularly important if the credit quality of a firm deteriorates rapidly, 
as is typically the case with companies which are close to default. The EDF 
as estimated by KMV, on the other hand, reacts quickly to changes in the 
economic prospects of a firm, as these tend to be reflected in the firm’s share 
price and hence in the estimated distance to default. Examples that show that 
the KMV approach often detects a deterioration in the credit quality of a 
company prior to a downgrading by the rating agencies are given in Crosbie 
and Bohn (2002). 


e EDFs tend to reflect the current macroeconomic environment. The distance to 
default is observed to rise in periods of economic expansion (essentially due 
to higher share prices reflecting better economic conditions) and to decrease 
in recession periods. The historical rating-transition probabilities provided by 
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Moody’s and Standard & Poor’s on the other hand are relatively insensitive 
to the current macroeconomic environment. Hence KMV’s EDFs might be 
better predictors of default probabilities over short time horizons. 


The following points are drawbacks of the KMV methodology and can be viewed 
as advantages of the credit-migration approach. 


e The KMV methodology is quite sensitive to global over- and underreaction 
of equity markets. In particular, the breaking of a stock market bubble may 
lead to drastically increased EDFs, even if the economic outlook for a given 
corporation has not changed very much. This can be problematic if a KMV- 
type model is widely used to determine the regulatory capital that a bank 
needs to support its loan book. The breaking of a stock market bubble might 
lead to a substantial increase in the required regulatory capital. This limits the 
ability of banks to supply new credit, which might have an adverse impact on 
the real economy. This is a prime example of the potential negative-feedback 
effects of risk management and regulation that we discussed in Section 1.3 
under the label “the crocodile of risk management is (possibly) eating its own 
tail”. 


Finally, the KMV methodology as presented here applies only to firms with 
publicly traded stock, whereas a ratings-based approach can be applied to all 
companies for which some internal rating is available. 


Credit-migration models as firm-value models. We now show how credit-migra- 
tion models such as CreditMetrics can be embedded in a firm-value model of the 
Merton type. We consider a firm which has been assigned to some rating category at 
the outset of the time period of interest [0, T] and for which transition probabilities 
P(j),0 < j < n, are available on the basis of that rating. These express the proba- 
bility that the firm belongs to rating class j at the time horizon T and constitute a 
row of some table similar to Table 8.2. In particular, p(0) is the default probability 
of the firm. 
Suppose that the asset-value process (V;) of the firm follows the model given 
in (8.3), so that 
Vr = Voexp((v — 50y)T + ov Wr) (8.16) 


is lognormally distributed. We can now choose thresholds 
—00 = dy < dy < <- < dn < dn41 = (8.17) 


such that P(d; < Vr <S dj+1) = p(j) for j € {0,..., n}. Thus we have translated 
the transition probabilities into a series of thresholds for an assumed asset-value pro- 
cess. The threshold d 1 is the default threshold; in the Merton model of Section 8.2.1, 
dı was interpreted as the value of the firm’s liabilities. The higher thresholds are the 
asset-value levels that mark the boundaries of higher rating categories. The firm- 
value model in which we have embedded the migration model can be summarized 
by saying that the firm belongs to rating class j at the time horizon T if and only if 
d; <VWr< dj+t. 
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The migration probabilities in the firm-value model obviously remain invariant 
under simultaneous strictly increasing transformations of Vr and the thresholds dj. 
If we define 


ee In Vr — In Vo — (uy — 507)T 
ovVT 
P Ind; — 1n Vo — (uy — }02)T 
a oy sT ' 
then we can equivalently say that the firm belongs to rating class j at the time 
horizon T if and onlyifdj < Xr < dj+1. Observe that Xy is a standardized version 


of the asset-value log-return ln Vr — In Vo; we can easily verify that Xr = Wr/ JT 
and that it therefore has a standard normal distribution. 


(8.18) 


(8.19) 


8.2.5 Multivariate Firm-Value Models 


The firm-value models of this section, such as KMV and CreditMetrics, have been 
discussed in relation to the default (or credit migration) risk of a single firm. In 
order to apply these models at portfolio level we require a multivariate version of 
Merton’s model. 

Now assume that we have m companies and that the multivariate asset-value 
process (V;) with V; = (Vi1,.--, Vim)’ follows an m-dimensional geometric 
Brownian motion with drift vector wy = (uy1,...,;4vm)’, vector of volatilities 
ov = (0V1, -.., Vm} and instantaneous correlation matrix P. 

This implies that for all 7 the asset value Vr; is of the form (8.16), with uy = 
uvi and oy = oy; and Wr = Wri. Moreover, Wr := (Wri,...,Wr.m)’ is 
a multivariate normal random vector satisfying Wr ~ Nm(0, TP). The model 
is completed by setting thresholds as in (8.17) for each firm: in a Merton-style 
model each firm would have a default threshold corresponding to liabilities, and in 
a CreditMetrics model the thresholds would be determined by the credit-migration 
probabilities of the firms. Note that we could again transform asset values and 
thresholds using transformations of the form (8.18) and (8.19). This would result in 
variables Xp jj = Wri/VT satisfying Xr = (X74,...,X7.m)’ ~ Nm(O, P) and 
the model would have again been translated onto a standard Gaussian scale. Models 
of this kind will studied in more detail in Section 8.3. 


Notes and Comments 


There are many excellent texts, at varying technical levels, in which the basic results 
on mathematical finance used in Section 8.2.2 can be found. Models in discrete 
time are discussed in Cox and Rubinstein (1985), Jarrow and Turnbull (1999) and 
in the more advanced book by Follmer and Schied (2004). Excellent introduc- 
tions to continuous-time models include Baxter and Rennie (1996), Duffie (2001), 
Bjork (1998), Bingham and Kiesel (1998) and Lamberton and Lapeyre (1996). More 
advanced texts are Musiela and Rutkowski (1997) and Karatzas and Shreve (1998); 
the technical level of the latter two volumes is not needed in this book. 
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Lando (2004) gives a good overview of the rich literature on firm-value models. 
First-passage-time models have been considered by, among others, Black and Cox 
(1976) and, including stochastic interest rates, by Longstaff and Schwartz (1995). 
The problem of the unrealistically low credit spreads for small maturities t = T — t, 
which we pointed out in Figure 8.2, has also led to extensions of Merton’s model. 
Partial remedies within the class of firm-value models include models with jumps 
in the firm value, as in Zhou (2001), time-varying default thresholds, as in Hull 
and White (2001), stochastic volatility models for the firm-value process with time- 
dependent dynamics, as in Overbeck and Schmidt (2003), and incomplete informa- 
tion on firm value or default threshold, as in Duffie and Lando (2001) and Giesecke 
(2005). Models with endogenous default thresholds have been considered by, among 
others, Leland (1994), Leland and Toft (1996) and Hilberink and Rogers (2002). 

The original documentation for the KMV model is Crosbie and Bohn (2002) (for 
the modelling of default of a single entity) and Kealhofer and Bohn (2001) (for 
portfolio aspects of the model). Moreover, Moody’s KMV has recently developed 
a private firm model, which provides EDFs for small-to-medium-size private firms 
without publicly traded stock (see Nyberg, Sellers and Zhang 2001). 

A good discussion of industry models for credit risk is given in Crouhy, Galai 
and Mark (2000) (see also Chapters 8—10 of Crouhy, Galai and Mark (2001) for a 
more detailed presentation). Chapter 7 of Crouhy, Galai and Mark (2001) contains 
useful background information on credit-rating systems. Statistical approaches to 
the estimation of rating-transition matrices are discussed in Hu, Kiesel and Perraudin 
(2002) and in Lando and Skodeberg (2002). The latter paper also shows that there 
is some momentum in rating-transition data, which contradicts the assumption that 
rating transitions form a Markov chain. The literature on statistical properties of 
rating transitions is surveyed extensively in Chapter 4 of Duffie and Singleton (2003). 


8.3 Threshold Models 


The models of this section are one-period models for portfolio credit risk inspired by 
the firm-value models of Section 8.2. Their defining attribute is the idea that default 
occurs for a company i when some critical rv X; := Xp; lies below some critical 
deterministic threshold d; at the end of the time period [0, T]. In Merton’s model X; 
is alognormally distributed asset value and d; represents liabilities; in CreditMetrics 
Xj is a normally distributed rv, interpreted as a change in logarithmic asset value. 
Portfolio extensions of firm-value models typically use multivariate lognormal or 
normal distributions for the vector X = (X1,..., Xm)’. The dependence among 
defaults stems from the dependence among the components of the vector X. 

The very general set-up of the threshold models of this section will allow both 
more general interpretations for the critical variable and more general distributional 
models. For example, in Li’s model, discussed in Example 8.7, the critical variables 
are the “times to default” of the firms, and the critical threshold is the time horizon T 
itself. The distributions assumed for X can be completely general and indeed a major 
issue of this section will be the influence of the copula of the multivariate distribution 
of X on the risk of the portfolio. 
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8.3.1 Notation for One-Period Portfolio Models 


It is convenient to introduce some notation for one-period portfolio models which 
will be in force throughout the remainder of the chapter. We consider a portfolio of 
m obligors and fix a time horizon T. For 1 < i < m, we let the rv S; be a state 
indicator for obligor i at time T and assume that S; := S7j takes integer values 
in the set {0, 1, ...,} representing, for example, rating classes; as in the previous 
section, we interpret the value 0 as default and non-zero values as states of increasing 
credit quality. At time t = 0 obligors are assumed to be in some non-default state. 

Mostly we will concentrate on the binary outcomes of default and non-default 
and ignore the finer categorization of non-defaulted companies. In this case we write 
Y; := Yr; for the default indicator variables so that Y; = 1 «=> S; = 0 and 
Y; =0 <> Si; > 0. The random vector Y = (Y1, ..., Ym)’ is a vector of default 
indicators for the portfolio and p(y) = P(Y, = y1, ..., Ym = ym), y € {0, 1y”, 
is its joint probability function; the marginal default probabilities are denoted by 
pi = P(Y; =,i=1,...,m. 

The default or event correlations will be of particular interest to us; they are 
defined to be the correlation of the default indicators. Because 


var(¥i) = EYP) — P? = EY) — P? = Pi — P. 
we obtain for firms 7 and j, with i Æ j, 
E(Y:Y;) — Pi Pj 
Bi — PDG; - P) 


We count the number of defaulted obligors at time T with the rv M := } 44 Y;. The 
actual loss if company i defaults—termed loss given default (LGD) in practice—is 
modelled by the random quantity 5;e;, where e; represents the overall exposure to 
company i and 0 < ô; < 1 represents a random proportion of the exposure which 
is lost in the event of default. We will denote the overall loss by L := pe ôieiYi 
and make further assumptions about the e; and ô; variables as and when we need 
them. 

It is possible to set up different credit risk models leading to the same multivariate 
distribution of S or Y. Since this distribution is the main object of interest in the 
analysis of portfolio credit risk, we call two models with state vectors S and S (or 
Y and Y) equivalent if S £ 5 (or Y £ ¥). 


p(Yi, Yj) = (8.20) 


The exchangeable special case. To simplify the analysis we will often assume that 
the state indicator S, and thus the default indicator Y, are exchangeable. This seems 
the correct way to mathematically formalize the notion of homogeneous groups 
that is used in practice. Recall that a random vector S is said to be exchangeable 
if (S1, ..., Sm) 2 (SM0) -- -> SImn)) for any permutation (M (1), ..., [7(m)) of 
(1, ..., m). Exchangeability implies in particular that, for any k € {1,...,m — 1}, 
all of the (5) possible k-dimensional marginal distributions of S are identical. 
In this situation we introduce a simple notation for default probabilities where 
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x := P(Y; = 1),i € {1,..., m}, is the default probability of any firm and 


m= P(Y =1,..., Yp =), {i1,... ik} C{l,...,m}, 2<k<m, 
(8.21) 
is the joint default probability for k firms. In other words, zx is the probability that 
an arbitrarily selected subgroup of k companies defaults in [0, T]. When default 
indicators are exchangeable, we get 


EY%))=EQ)=P%=)=n, Vi, 
EVQYiY¥j)=P%=1,Y;=l)=m, WiFj, 


so that cov(Y;, Yj) = m2 — mc”; this implies that the default correlation in (8.20) is 
given by 
a g 
por =p Y) = 2, isis, (8.22) 
m-n 
which is a simple function of the first- and second-order default probabilities. 


8.3.2 Threshold Models and Copulas 


We start with a general definition of a threshold model before discussing the link to 
copulas. 


Definition 8.4. Let X = (X1, ..., Xm} be an m-dimensional random vector and 
let D € R”*” be a deterministic matrix with elements d;j such that, for every i, the 
elements of the ith row form a set of increasing thresholds satisfying dj, <--- < din. 
Augment these thresholds by setting dig = —oo and dj(n41) = © for all obligors 
and then set 


S =j aS dij < Xi S dig+)> j €{0,...,n}, i € {1,..., m}. 


Then (X, D) is said to define a threshold model for the state vector § = 
(Si, Sm). 


We refer to X as the vector of critical variables and denote its marginal dfs by 
Fi(x) = P(X; < x). The ith row of D contains the critical thresholds for firm i. 
By definition, default (corresponding to the event S; = 0) occurs if X; < dj, so that 
the default probability of company i is given by p; = F; (di1). 

In the context of such models it is important to distinguish the default correlation 
p(Yi, Yj) of two firms i # j from the so-called asset correlation (the correlation 
of the critical variables X; and X;). For given default probabilities, o(Y;, Y;) is 
determined by E(Y;Y;) according to (8.20), and in a threshold model E(Y;Y;) = 
P(X; <S di1, Xj < dj), so default correlation depends on the joint distribution of 
X; and Xj. If X is multivariate normal, as in the CreditMetrics/KMV-type models, 
the correlation of X; and X; determines the copula of their joint distribution and 
hence the default correlation (see Lemma 8.5 below). For general critical variables 
outside the multivariate normal class, the correlation of the critical variables does 
not fully determine the default correlation; this can have serious implications for the 
tail of the distribution of M = )~""_, Y;, as will be shown in Section 8.3.5. 
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We now give a simple criterion for equivalence of two threshold models in terms of 
the marginal distributions of the state vector S and the copula of X. While straightfor- 
ward from a mathematical viewpoint, the result is useful for studying the structural 
similarities between various industry models for portfolio credit risk management. 
For the necessary background information on copulas we refer to Chapter 5. 


Lemma 8.5. Let (X, D) and (X ; D) be a pair of threshold models with state vectors 
S = (S),..., Sm)’ and S = (S,..., Sm)’, respectively. The models are equivalent 
if the following conditions hold. 


(i) The marginal distributions of the random vectors S and 5 coincide, i.e. 
P(Si=j)= PŠ =j) je{l,....n}, ie {l,...,m)}. 
(ii) X and X admit the same copula C. 


Proof. According to Definition 8.4, S £ S if and only if, for all ji,..., jm € 
{1,..., n}, 


P(dij < Xi S di(j tty ++ ++ dmim < Xm S dmn) 
= P(dij, < Xi <S diit -- -s dmjn < Xm < Am(jy+1))> 
By standard measure-theoretic arguments this holds if, for all ji,..., Jm € 
{Teeny 1}; 
POR dijs ---, Xm S Gn) = POT <S dijs ---, Xm Sdn) 


By Sklar’s theorem (Theorem 5.3) this is equivalent to 
CCF (diji), «+++ Fm (dinjn)) = CÈ dij), «+++ Fim dinin))> 


where C is the copula of X and X (using condition (ii)). Condition (i) implies that 
Fj (dij) = Fi(dij) forall j € {1,...,n},i € {1,..., m}, and the claim follows. 


The copula in a threshold model determines the link between marginal probabili- 
ties of migration for individual firms and joint probabilities of migration for groups 
of firms. Consider for simplicity a two-state model for default and non-default and 


a subgroup of k companies {i,,..., ig} C {1,..., m} with individual default prob- 
abilities p;,,..., Pip- Then 
P(Y;,, =1,..., Yi, = 1) = P(Xi, < diy, ..., Xi < di,t) 
= Cj,..-i, (Diy, «+++ Dig)s (8.23) 


where C;j,...;, denotes the corresponding k-dimensional margin of C. As a special 
case consider now a model for a single homogeneous group. We assume that X has an 
exchangeable copula (i.e. a copula of the form (5.18)) and that all individual default 
probabilities are equal to some constant x so that the default indicator vector Y is 
exchangeable. The formula (8.23) reduces to the useful formula 


We = Cy..g(U,...,0), 2Ck<m, (8.24) 


which will be used for the calibration of some copula models later on. 
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8.3.3 Industry Examples 


As we have remarked, a number of popular industry models fit into the general 
framework of threshold models and we give some more detail in this section. 


Example 8.6 (CreditMetrics and KMV models). The portfolio versions of the 
KMV and CreditMetrics models introduced in Section 8.2.5 use a similar mecha- 
nism to model the joint distribution of defaults; they differ only with respect to the 
approach used for the determination of individual default probabilities. 

In both models the vector of critical variables X is assumed to have a multivariate 
normal distribution and X; can be interpreted as a change in asset value for obligor i 
over the time horizon of interest; dj; is chosen so that the probability that X; < 
dj, matches the given default probability p; for company i. Obviously, both the 
CreditMetrics and KMV models work with a Gauss copula for the critical variables 
X and are hence structurally similar. In particular, by Proposition 8.5 the two- 
state versions of the models are equivalent, provided that the individual default 
probabilities p1, ..., Pm and the correlation matrix P of X are identical. 

In both models the covariance matrix of X is calibrated using a factor model of 
the kind described in Section 3.4.1. Assume that we have transformed the critical 
variables and thresholds in such a way that the margins of X are standard normal. 
It is assumed that X can be written as 


X=BF+e (8.25) 


for a p-dimensional random vector of common factors F ~ N,(0, 2) with p < m, 
a loading matrix B € R”*P, and an m-dimensional vector of independent univariate 
normally distributed errors €, which are also independent of F. Here the random vec- 
tor F represents country and industry effects. Obviously, the factor structure (8.25) 
implies that the covariance matrix P of X (which will be a correlation matrix due to 
our assumptions on the marginal distributions of X) is of the form P = B2 B'+ 7, 
where Y is the diagonal covariance matrix of e. 

Writing b; = (bi1,..., bi py for the ith row of B, the ith critical variable has the 
structure X; = bi F + si. Recalling that var(X;) = 1, it follows that 


Bi = bi 2b; (8.26) 


can be viewed as the systematic risk of X;: that is, the part of the variance of X; 
which is explained by the common factors F. The idiosyncratic risk not explained 
by the common factors is var(e;) = 1 — ĝi. 

In the factor model employed by KMV the factors are assumed to be observable, 
and a time series of factor returns is constructed by forming appropriate indices 
of asset values of publicly traded companies. The factor weights comprising B are 
determined using non-quantitative economic arguments combined with regression 
techniques; some details can be found in Kealhofer and Bohn (2001). 


Example 8.7 (Li’s model). This model, proposed in Li (2001), is a simple dynamic 
model used by practitioners to price basket credit derivatives. The author interprets 
the critical variable X; as the default time of company i and assumes that X; is 
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exponentially distributed with parameter A; so that F;(t) = 1 — exp(—Ajt). Obvi- 
ously, company i defaults by time T if and only if X; < T, so pi = F(T). To 
determine the multivariate distribution of X, Li assumes that X has the Gauss cop- 
ula ce for some correlation matrix P (see, for example, (5.9) in Section 5.1.2) 
so that P(X, <S t,...,Xm < tm) = CHF (t,),..., Fin (tm)). It is immediate 
from Lemma 8.5 that in Li’s model the distribution of the default indicators at some 
fixed horizon T is equivalent to a model of CreditMetrics/KMV type, provided that 
individual default probabilities coincide and that the correlation matrix of the asset- 
value change X in the KMV-type model equals P. This equivalence is often used 
to calibrate Li’s model. We will have a closer look at the model in our analysis of 
dynamic copula models in Section 9.7. 


8.3.4 Models Based on Alternative Copulas 


While most threshold models used in industry are based explicitly or implicitly on 
the Gauss copula, there is no reason why we have to assume a Gauss copula. In fact, 
simulations presented in Section 8.3.5 show that the choice of copula may be very 
critical to the tail of the distribution of the number of defaults M. We now look at 
threshold models based on alternative copulas. 

The first class of model attempts to preserve some of the flexibility of models of 
KMV/CreditMetrics type, which do have the appealing feature that they can accom- 
modate a wide range of different correlation structures for the critical variables. This 
is clearly an advantage in modelling a portfolio where obligors are exposed to sev- 
eral risk factors and where the exposure to different risk factors differs markedly 
across obligors, such as a portfolio of loans to companies from different industry 
sectors or countries. 


Example 8.8 (normal mean-variance mixtures). For the distribution of the critical 
variables we consider the kind of model described in Section 3.2.2. We start with 
an m-dimensional multivariate normal vector Z ~ N,,(0, X) and a positive, scalar 
rv W, which is independent of Z. The vector of critical variables X is assumed to 
have the structure 


X =m(W)+JWZ, (8.27) 


where m : [0, 00) — R” is a measurable function. In the special case where m(W) 
takes a constant value m not depending on W, the distribution is called a normal 
variance mixture. 

An important example of a normal variance mixture is the multivariate ¢ distribu- 
tion, as discussed in Example 3.7, which is obtained when W has an inverse gamma 
distribution, W ~ Ig(5¥, $v), or equivalently when v/ W ~ an An example of a 
general mean-variance mixture is the generalized hyperbolic distribution discussed 
in Section 3.2.3. 

In a normal mean-variance mixture model the default condition may be written 
as 

Gs A 3, (8.28) 
Jw VW 


Xi <S di 4> ZK 
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where mj; (W) is the ith component of m(W). A possible economic interpretation of 
the model (8.27) is to consider Z; as the asset value of company i and dj; as ana 
priori estimate of the corresponding default threshold. The actual default threshold 
is stochastic and is represented by D;, which is obtained by applying a multiplicative 
and an additive shock to the estimate d;,. If we interpret this shock as a stylized 
representation of global factors such as the overall liquidity and risk appetite in the 
banking system, it makes sense to assume that the shocks to the default thresholds 
of different obligors are driven by the same rv W. 

Normal variance mixtures, such as the multivariate t, provide the most tractable 
examples of normal mean-variance mixtures; they admit a similar calibration 
approach using linear factor models to models based on the Gauss copula. In nor- 
mal variance mixture models the correlation matrices of X (when defined) and Z 
coincide. Moreover, if Z follows a linear factor model (8.25), then X inherits the 
linear factor structure from Z. Note however, that the systematic factors /W F and 
the idiosyncratic factors We are no longer independent but merely uncorrelated. 

A threshold model based on the t copula can be thought of as containing the 
standard KMV/CreditMetrics model based on the Gauss copula as a limiting case 
as v — oo. However, the additional parameter v adds a great deal of flexibility to 
the model. We will come back to this point in Section 8.3.5. 


Another class of parametric copulas that could be used in threshold models is the 
Archimedean family of Section 5.4. 


Example 8.9 (Archimedean copulas). Recall that an Archimedean copula is the 
distribution function of a uniform random vector of the form 


Cut, .--,Um) = 6 (b(u1) +--+ + O(Um)), (8.29) 


where @ : [0, 1] — [0, oo] is a continuous, strictly decreasing function, known as 
the copula generator, and go! is its inverse. We assume that (0) = oo, (1) = Oand 
that #7! is completely monotonic (see equation (5.39) and surrounding discussion). 
As explained in Section 5.4, these conditions ensure that (8.29) defines a copula for 
any portfolio size m. Our main example in this chapter will be Clayton’s copula. 
Recall from Section 5.4 that the Clayton copula has generator ¢g(t) = t° — 1, 
where 6 > 0, leading to the copula 


Cott, ...,Um) = (U? +e aR? H mT. (8.30) 


As discussed in Section 5.4, exchangeable Archimedean copulas suffer from the 
deficiency that they are not rich in parameters and can model only exchangeable 
dependence and not a fully flexible dependence structure for the critical variables. 
Nonetheless, they yield useful parsimonious models for relatively small homoge- 
neous portfolios, which are easy to calibrate and simulate, as we discuss in more 
detail in Section 8.4.4. 

Suppose that X is arandom vector with an Archimedean copula and with marginal 
distributions F;, 1 < i < m, so that (X, D) specifies a threshold model with indi- 
vidual default probabilities F; (dj). As a particular example consider the Clayton 
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copula and assume a homogeneous situation where all individual default probabil- 
ities are identical to x. Using relation (8.23), we can calculate that the probability 
that an arbitrarily selected group of k obligors from a portfolio of m such obligors 
defaults over the time horizon is given by zg = (ka ~° — k + 1)~!/9. Essentially, 
the dependent default mechanism of the homogeneous group is now determined 
by this equation and the parameters x and 0. We study this Clayton copula model 
further in Example 8.22. 


8.3.5 Model Risk Issues 


Recall from Chapter 2 that model risk may be roughly defined as the risk associated 
with working with misspecified models—in our case, models which are a poor 
representation of the true mechanism governing defaults and migrations in a credit 
portfolio. For example, if we intend to use our models to estimate measures of tail 
risk, like VaR and expected shortfall, then we should be particularly concerned with 
the possibility that they might underestimate the tail of the portfolio loss distribution. 

As we have seen, a threshold model essentially consists of a collection of default 
(and migration) probabilities for individual firms and a copula that describes the 
dependence of certain critical variables. In discussing model risk in this context we 
will concentrate on models for default only and assume that individual default prob- 
abilities have been satisfactorily determined. It is much more difficult to determine 
the copula describing default dependence and we will look at model risk associated 
with the misspecification of this component of the threshold model. 


The impact of the choice of copula. Since most threshold models used in industry 
use the Gauss copula, we are particularly interested in the sensitivity of the dis- 
tribution of the number of defaults M with respect to the assumption of Gaussian 
dependence. Our interest is motivated by the observation made in Section 5.3.1 that, 
by assuming a Gaussian dependence structure, we may underestimate the probabil- 
ity of joint large movements of risk factors, with potentially drastic implications for 
the performance of risk-management models. 

We compare a model with multivariate normal critical variables and a model 
where the critical variables are multivariate t. For simplicity we consider a homoge- 
neous group model with factor structure, which we now describe. Given a standard 
normal rv F, an iid sequence €1,..., €m Of standard normal variates independent 
of F and an asset correlation parameter p € [0, 1], we define a random vector Z by 
Zi = JPF + ./1 — psi. Observe that this vector follows the so-called equicorre- 
lation factor model described in Example 3.34 and equation (3.63). 

In the ¢ copula case we define the critical variables X; := JW Zi, where 
W~ Ig(4v, 4v) is independent of Z, so that X has a multivariate ¢ distribution. 
In the Gauss copula case we simply set X := Z. In both cases we choose thresholds 
so that P(Y; = 1) = x for alli and for some z € (0, 1). Note that the correlation 
matrix P of X (the asset correlation matrix) is identical in both models and is given 
by an equicorrelation matrix with off-diagonal element p. However, the copula of 
X differs, and we expect more joint defaults in the £ model due to the higher level 
of dependence in the joint tail of the t copula. 
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Table 8.4. Results of simulation study. We tabulate the estimated 95th and 99th percentiles 
of the distribution of M in an exchangeable model with 10000 firms. The values for the 
default probability x and the asset correlation p corresponding to the three groups A, B and 
C are given in the text. 


40.95(M) 40.99(M) 
ST a, SS TTT a E 


Group v=oo v=50 v=10 v=oo v=50 v=10 


A 14 23 24 21 49 118 
B 109 153 239 157 261 589 
C 1618 1723 2085 2206 2400 3067 


We define three groups of decreasing credit quality, labelled A, B and C. These 
groups do not correspond exactly to the A, B and C rating categories used by any 
of the well-known rating agencies, but they are nonetheless realistic values for 
Gaussian threshold models for real obligors. In group A we set 7 = 0.06% and 
p = 2.58%; in group B we set x = 0.50% and p = 3.80%; in group C we set 
x = 7.50% and p = 9.21%. We consider a portfolio of size m = 10000. For each 
group we vary the degrees-of-freedom parameter v. In order to represent the tail 
of the number of defaults M, we use simulations to determine (approximately) the 
95% and 99% quantiles, go.95(M) and go.99(M), and tabulate them in Table 8.4. 
The actual simulation was performed using a representation of threshold models as 
Bernoulli mixture models that is discussed later in Section 8.4.4. 

Table 8.4 shows that v clearly has a massive influence on the high quantiles. 
For the important 99% quantile the impact is most pronounced for group A, where 
qgo.99(M) is increased by a factor of almost six when we go from a Gaussian model 
to a model with v = 10. 


The impact of changing asset correlation. Here we stick to the assumption that X 
has a Gauss copula and study the impact of the factor structure of the asset returns 
on joint default events and hence on the tail of M. More specifically, we increase the 
systematic risk component of the critical variables for the obligors in our portfolio 
(see equation (8.26)) and analyse how this affects the tail of M. We use the homoge- 
neous group model introduced above as a vehicle for our analysis. We fix the default 
probability at x = 0.50% (the value for group B above) and vary the asset correla- 
tion p, which gives the systematic risk for all obligors in the homogeneous group 
model, using the values p = 2.58%, p = 3.80% and p = 9.21%. In Table 8.5 we 
tabulate go.95(M) and qo.99(M) for a portfolio with 10 000 counterparties. Clearly, 
varying p also has a sizeable effect on the quantiles of M. However, this effect is 
less drastic and, in particular, less surprising than the impact of varying the copula 
in our previous experiment. 


Commentary. Both simulation experiments indicate that attempts to calibrate 
threshold models using estimates of marginal default probabilities and crude esti- 
mates of the factor structure of the critical variables obtained from asset return data 
are prone to substantial model risk. Ideally, historical default data should also be 
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Table 8.5. Results of simulation study. Estimated 95th and 99th percentiles of the 
distribution of M in an exchangeable model for varying values of asset correlation p. 


Quantile p =2.58% p=3.80% p=9.21% 


40.95(M) 98 109 148 
qgo.99(M) 133 157 250 


used to estimate parameters describing default dependence; indeed, the best strat- 
egy might involve combining factor models for asset-value returns with statistical 
estimation of some of the key parameters (such as the systematic risk parameter 6 
in (8.26)) using historical default data. 


Notes and Comments 


Our presentation of threshold models is based, to a large extent, on Frey and McNeil 
(2001, 2003). In those papers we referred to the models as “latent variable” mod- 
els, because of structural similarities with statistical models of that name (see Joe 
1997). However, whereas in statistical latent variable models the critical variables 
are treated as unobserved, in credit models they are often formally identified, for 
example, as asset values or asset-value returns. 

The first systematic study of model risk for credit portfolio models is Gordy 
(2000). Our analysis of the impact of the copula of X on the tail of M follows Frey, 
McNeil and Nyfeler (2001). For an excellent discussion of various aspects of model 
risk in risk management in general we refer to Gibson (2000). 


8.4 The Mixture Model Approach 


In a mixture model the default risk of an obligor is assumed to depend on a set 
of common economic factors, such as macroeconomic variables, which are also 
modelled stochastically. Given a realization of the factors, defaults of individual 
firms are assumed to be independent. Dependence between defaults stems from the 
dependence of individual default probabilities on the set of common factors. We 
start with general definitions of Bernoulli and Poisson mixture models before going 
on to specific examples. 


Definition 8.10 (Bernoulli mixture model). Given some p < m and a p-dimen- 
sional random vector W = (W,..., Wp)’, the random vector Y = (Y1,..., Ym)’ 
follows a Bernoulli mixture model with factor vector W if there are functions pi; : 
R? — [0,1], 1 < i < m, such that conditional on W the components of Y are 
independent Bernoulli rvs satisfying P (Y; = 1 | W = 4) = pi). 


For y = (y1, ..-, Ym)’ in {0, 1}” we have that 


PY =y] Y =)= [| rw-p!” (8.31) 


i=1 
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and the unconditional distribution of the default indicator vector Y is obtained by 
integrating over the distribution of the factor vector W. In particular, the default 
probability of company i is given by p; = P(Y; = 1) = E(pi(®)). 

Since default is a rare event, we also explore the idea of approximating Bernoulli 
rvs with Poisson rvs in Poisson mixture models. Here a company may potentially 
“default more than once” in the period of interest, albeit with a very low probability; 
we will use the notation Y, i € {0, 1, 2,...} for the counting rv giving the number of 
“defaults” of company i. The formal definition parallels the definition of a Bernoulli 
mixture model. 


Definition 8.11 (Poisson mixture model). Given p and W as in Definition 8.10, the 
random vector Y = (Y Donis Fm) follows a Poisson mixture model with factors W 
if there are functions A; : RP > (0,00), 1 < i < m, such that conditional on 
wW = w the random vector Y is a vector of independent Poisson distributed rvs with 
rate parameter A; (Y). 


CreditRisk+, which is discussed in Section 8.4.2, is an industry example of a 
Poisson mixture model. Poisson mixture models also play an important role in 
actuarial mathematics (see Section 10.2.4). 

We define the rv M = pe Yy, į and observe that, for small Poisson parameters A;, 
M is approximately equal to the number of defaulting companies. Given the fac- 
tors, it is the sum of conditionally independent Poisson variables and therefore its 
distribution satisfies 


(in aw) 


i (8.32) 


PU =k |W = W) = exp (- 2,009) 


i=l 


If Y follows a Poisson mixture model and we define the indicators YSI NÆSE 
then Y follows a Bernoulli mixture model and the mixing variables are related by 
pi) = 1 — exp(—à;()). 

Note that the two-stage hierarchical structure of mixture models facilitates sam- 
pling from the models: first we generate the economic factor realizations, and then 
the pattern of defaults conditional on those realizations. The second step is easy 
because of the conditional independence assumption. 


8.4.1 One-Factor Bernoulli Mixture Models 


In many practical situations it is useful to consider a one-factor model. The infor- 
mation may not always be available to calibrate a model with more factors, and 
one-factor models may be fitted statistically to default data without great difficulty 
(see Section 8.6). Their behaviour for large portfolios is also particularly easy to 
understand, as will be shown in Section 8.4.3. 

Throughout this section, W is an rv with values in R and p;(¥) : R — [0, 1] 
are functions such that, conditional on W, the default indicator Y is a vector of 
independent Bernoulli rvs with P(Y; = 1 | Y = y) = p;(w). We now consider a 
variety of special cases. 
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Exchangeable Bernoulli mixture models. A further simplification occurs if the 
functions p; are all identical. In this case the Bernoulli mixture model is termed 
exchangeable, since the random vector Y is exchangeable. It is convenient to intro- 
duce the rv Q := p;(W) and to denote the distribution function of this mixing 
variable by G(qg). Conditional on Q = q the number of defaults M is the sum of m 
independent Bernoulli variables with parameter g and hence has a binomial distri- 
bution with parameters q and m, i.e. P(M = k | Q = q) = (aa — q)"-*. The 
unconditional distribution of M is obtained by integrating over g. We have 


= 5 m : k m—k 
pm =0= (7) | Jaa aca): (8.33) 


Using the notation of Section 8.3.1 we can calculate default probabilities and joint 
default probabilities for the exchangeable group. Simple calculations give m = 
E(Y1) = E(E(% | Q)) = E(Q) and, more generally, 


m = P(Y =1,...,% = 1) = E(E (Y1 -- -Yz | Q)) = E(O"), (8.34) 


so that unconditional default probabilities of first and higher order are seen to be 
moments of the mixing distribution. Moreover, fori # j, cov(Y;, Yj) = m2 — r? = 
var(Q) > 0, which means that in an exchangeable Bernoulli mixture model the 
default correlation py defined in (8.22) is always non-negative. Any value of py 
in [0, 1] can be obtained by an appropriate choice of the mixing distribution G. In 
particular, if py = var(Q) = 0, the rv Q has a degenerate distribution with all 
mass concentrated on the point x and the default indicators are independent. The 
case py = | corresponds to a model where x = mz and the distribution of Q is 
concentrated on the points 0 and 1. 


Example 8.12 (beta, probit-normal and logit-normal mixtures). The following 
mixing distributions are frequently used in Bernoulli mixture models. 


Beta mixing distribution. Here we assume that Q ~ Beta(a, b) for some param- 
eters a > 0 and b > 0. See Section A.2.1 for more details concerning the beta 
distribution. 


Probit-normal mixing distribution. Here Q = (u + o¥) for W ~ N(O, 1), 
u € Rando > 0, where @ is the standard normal distribution function. It turns 
out that this model can be viewed as a one-factor version of the CreditMetrics 
and KMV-type models; this is a special case of a general result in Section 8.4.4 
(see equation (8.45) in particular). 

Logit-normal mixing distribution. Here Q = F(u + oW) for YW ~ N(O, 1), 
u € Rando > 0, where F(x) = (1 + exp(—x))7! is the df of a so-called 
logistic distribution. 


In the model with beta mixing distribution, the higher-order default probabilities 
xg and the distribution of M can be computed explicitly (see Example 8.13 below). 
Calculations for the logit-normal, probit-normal and other models generally require 
numerical evaluation of the integrals in (8.33) and (8.34). If we fix any two of 
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I, m2 and py in a beta, logit-normal or probit-normal model, then this fixes the 
parameters a and b or u and o of the mixing distribution and higher-order joint 
default probabilities are automatic. 


Example 8.13 (beta mixing distribution). By definition, the density of a beta 
distribution is given by 


g(q) = qm! -— q), a,b>0,0<q<1, 


B(a, b) 
where £ (a, b) denotes the beta function. Below we use the fact that the beta function 
satisfies the recursion formula (a+ 1, b) = (a/(a+b))£ (a, b); this is easily estab- 
lished from the representation of the beta function in terms of the gamma function 
in Section A.2.1. Using (8.34) we obtain for the higher-order default probabilities 


1 (a+ k, b) 
B(a, b) B(a, b) 
The recursion formula for the beta function yields zg = Tce (a+ j)/(a+b+ j); 


in particular, 7 = a/(a + b), m2 = m (a + 1)/(a +b + 1) and py = (a+b + 17t. 
The rv M has a so-called beta-binomial distribution. We obtain from (8.33) that 


m 1 : k+a—1 m—k+b—1 
P(M = k) = A q (1-4) dq 


RS dy Dee 


1 
m = f q4 (1-4) * dq = 
0 


kJ f(a, b) 
- (Ee, (8.35) 
k f(a, b) 


One-factor models with covariates. It is quite straightforward to construct Ber- 
noulli mixture models that have a single common mixing variable W but which allow 
covariates for individual firms to influence the default probability; these covariates 
might be indicators for group membership, such as rating class or industry sector, 
or key ratios taken from a company’s balance sheet. 

Writing x; € R* for a vector of deterministic covariates, a typical model for the 
conditional default probabilities p;(W) in (8.31) would be to assume that 


pi(V) =h(wt B'xi +o), (8.36) 


where h : R — (0, 1) is a strictly increasing link function, such as h(x) = ®(x) 
or h(x) = (1 + exp(—x))7, the vector B = (f1,..., Bx)’ contains regression 
parameters, u € R is an intercept parameter, and o > 0 is a scaling parameter. Such 
a specification is commonly used in the class of generalized linear models in statistics 
(see Section 8.6.3). We could complete the mixture model by specifying that W is 
standard normally distributed, which would mean that the mixing distribution for the 
conditional default probability of each individual firm was of either probit-normal 
or logit-normal form. 

Clearly, if x; = x for all i, so that all risks have the same covariates, then we are 
back in the situation of full exchangeability. Note also that, since the function p;(W) 
is increasing in W, the conditional default probabilities (p1(W), ..., Dm(W)) form 
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a comonotonic random vector; hence, in a state of the world where the default prob- 
ability is comparatively high for one counterparty, it is high for all counterparties. 
For a discussion of comonotonicity we refer to Section 5.1.6. 


Example 8.14 (model for several exchangeable groups). The regression structure 
in (8.36) includes partially exchangeable models where we define a number of groups 
within which risks are exchangeable. These groups might represent rating classes 
according to some internal or rating-agency classification. 

If the covariates x; are simply k-dimensional unit vectors of the form x; = e;(j), 
wherer(i) € {1,..., k} indicates, say, the rating class of firm, then the model (8.36) 
can be written in the form 


DiY) = hura +o) (8.37) 


for parameters u, := u + br forr =1,...,k. 

Inserting this specification in (8.31) we can find the conditional distribution of 
the default indicator vector. Suppose there are m, obligors in rating category r for 
r=1,...,k, and write M, for the number of defaults. The conditional distribution 
of the vector M = (M,,..., Mx)’ is given by 


k 


pm =1\¥ = =T]( 


r=l 


My 


i Jau +oy))"(1—-h(u, +0), (8.38) 


where I = (I,...,/,)'. A model of precisely the form (8.38) will be fitted to 
Standard & Poor’s default data in Section 8.6.4. The asymptotic behaviour of such 
a model (when m is large) is investigated in Example 8.17. 


8.4.2 CreditRisk+ 


CreditRisk+ is an industry model for credit risk that was proposed by Credit Suisse 
Financial Products in 1997 (see Credit Suisse Financial Products 1997). The model 
has the structure of a Poisson mixture model, where the factor vector W consists of p 
independent, gamma-distributed rvs. The distributional assumptions and functional 
forms imposed in CreditRisk+ make it possible to compute the distribution of M 
fairly explicitly using techniques for mixture distributions that are well known in 
actuarial mathematics and which are also discussed in Chapter 10. 


The structure of CreditRisk+. CreditRisk+ is a Poisson mixture model in the 
sense of Definition 8.11. The (stochastic) parameter A; (W) of the conditional Poisson 
distribution for firm i is given by A; (W) = k; w}W for aconstantk; > 0, non-negative 
factor weights w; = (wj1,..., Wip)” satisfying a wij = 1, and p independent 
Ga(a;, B;)-distributed factors Wj, ..., Pp with parameters set to bea; = Bj; = o7? 
for some ø; > 0. This parametrization of the gamma variables ensures that we 
have E(W;) = 1, var(¥;) = o? and E(A;(W)) = ki E(w,W) = kj. Observe that 
in this model the default probability is given by P(Y; = 1) = P(Y; > 0) = 
E(P(Y; > 0 | W)). Since Ý; is Poisson given W, we have that 


E(P; > 0|W)) = Ed — exp(—kj w}W)) © kj E(wiW) = ki, (8.39) 


8.4. The Mixture Model Approach 357 


where the approximation holds because k; is typically small. Hence k; is approxi- 
mately equal to the default probability for firm i. 


Gamma-Poisson mixtures. The distribution of M = $}, Y; in CreditRisk+ is 
conditionally Poisson and satisfies 


m 
|v = y~ Poi( Z eww). (8.40) 
i=1 
To compute the unconditional distribution of M we require a well-known result on 
mixed Poisson distributions, which appears as Proposition 10.20 in a discussion 
of relevant actuarial methodology for quantitative risk management in Chapter 10. 
This result says that if the rv N is conditionally Poisson with a gamma-distributed 
rate parameter, A ~ Ga(a, 8), then N has a negative binomial distribution, N ~ 
Nb(a, B/(6 + 1)). 

In the case when p = | we may apply this result directly to (8.40) to deduce that 
M has a negative binomial distribution (since a constant times a gamma variable 
remains gamma distributed). 

For arbitrary p we now show that M is equal in distribution to a sum of p 
independent negative binomial rvs. This follows by observing that 


m m P P m 
X kiwi = Sok; wi) = DY (Dk), 
i=1 1 j=l i=l 


i=l j= 


Now consider rvs M yest: M p such that M; is conditionally Poisson with mean 
(È; kiwij)W; conditional on W; = yj. The independence of the components 
W,...,%, implies that the M; are independent, and by construction we have 
M 2 Dii Mj. Moreover, the rvs Oy kj w;;)W; are gamma distributed, so that 
each of the M; has a negative binomial distribution by Proposition 10.20. 

This observation is the starting point for the computation of the distribution of M 
in CreditRisk+. Using Panjer recursion (see Section 10.2.3), it is in fact possible to 
derive simple recursion formulas for the probabilities P (M =k). 


8.4.3 Asymptotics for Large Portfolios 


We now provide some asymptotic results for large portfolios in Bernoulli mixture 
models. These results can be used to approximate the credit loss distribution and 
associated risk measures in a large portfolio. Moreover, they are useful for identi- 
fying the crucial parts of a Bernoulli mixture model. In particular, we will see that 
in one-factor models the tail of the loss distribution is essentially determined by the 
tail of the mixing distribution, which has direct consequences for the analysis of 
model risk in mixture models and for the setting of capital-adequacy rules for loan 
books. 

Since we are interested in asymptotic properties of the overall loss distribution, we 
also consider exposures and losses given default. Let (e;); <j be an infinite sequence 
of positive deterministic exposures, let (Y;);eņ be the corresponding sequence of 
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default indicators, and let (6;);< be a sequence of rvs with values in (0, 1] represent- 
ing percentage losses given that default occurs. In this setting the loss for a portfolio 
of size m is given by LM = ee 1 Li, where L; = e;6; Y; are the individual losses. 
We now make some technical assumptions for our model. 


(A1) There is a p-dimensional random vector W and functions £; : RP — [0, 1] 
such that, conditional on W, the (Z;);-n form a sequence of independent rvs 
with mean 4; (Y) = E(L; |W =w). 


In this assumption the conditional independence structure is extended from the 
default indicators to the losses. Note that it is not assumed that losses given default 
6; and default indicators are independent, an assumption which is made in many 
standard models. In particular, (A1) allows for the situation where Y; and ô; are only 
conditionally independent given W, such that £; (Y) = 6;(W) pi(w), where ôi (Y) 
gives the expected percentage loss given default, given W = yw. This extension 
is relevant from an empirical viewpoint since evidence suggests that losses given 
default tend to depend on the state of the underlying economy (see Notes and Com- 
ments). 


(A2) There is a function £ : RP — Rt such that 


1 1 m p 
im OEM E E OE (Y) = 
im — B(L | W =) = lim — D4 (H) = LW) 


m>oom 


for all y € RP. We call (yp) the asymptotic conditional loss function. 


Assumption (A2) implies that we preserve the essential composition of the portfolio 
as we allow it to grow (see, for instance, Example 8.17). 


(A3) There is some C < 00 such that X~? (e;/i)* < C for all m. 


This assumption prevents exposures from growing systematically with portfolio 
size. 

The following result shows that under these assumptions the average portfolio 
loss is essentially determined by the asymptotic conditional loss function £ and the 
realization of the factor random vector W. The proof is based on a suitable version 
of the strong law of large numbers (see Frey and McNeil (2003) for details). 


Proposition 8.15. Consider a sequence L™ = yey, Li satisfying Assump- 
tions (A1)-(A3) above. Denote by P(- | W = w) the conditional distribution of 
the sequence (L;)jcen given W = y. Then 


1 E 
lim —L™ =), P(- |Y = 4) a.s. 
m>œ m 


Proposition 8.15 obviously applies to the number of defaults M™ = pee 
if we set 6; = e; = 1. For a given sequence (Y;);en following a p-factor Bernoulli 
mixture model with default probabilities p;(w), Assumptions (A1) and (A3) are 
automatically satisfied and (A2) becomes 


1 m 
lim — 5 pi(w) = p(w) for some function p : R? > [0, 1]. (8.41) 
m>o m 


i=l 
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For one-factor Bernoulli mixture models we can obtain a stronger result which 
links the quantiles of L™ to quantiles of the mixing distribution. Again we refer to 
Frey and McNeil (2003) for a proof. 


Proposition 8.16. Consider a sequence L™ = yey, Li satisfying Assump- 
tions (A1l)-(A3) with a one-dimensional mixing variable Y with df G. Assume 
that the conditional asymptotic loss function @(w) is strictly increasing and right 
continuous and that G is strictly increasing at qa (¥ ), i.e. that G(qq(W) + ô) > a 
for every ô > 0. Then 


jim, ~ gal) = (qa (¥)). (8.42) 


Comments. The assumption that ĉis strictly increasing makes sense if it is assumed 
that low values of ¥ correspond to good states of the world with lower conditional 
default probabilities and lower losses given default than average, and that high values 
of y correspond to bad states with correspondingly higher losses given default. 

It follows from Proposition 8.16 that the tail of the credit loss in large one-factor 
Bernoulli mixture models is essentially driven by the tail of the mixing variable W. 
Consider in particular two exchangeable Bernoulli mixture models with mixing dis- 
tributions G;(g) = P(Q; < q),i = 1, 2. Suppose that the tail of G, is heavier than 
the tail of Go, i.e. that we have Gı(q) < G2(q) for q close to 1. Then Proposi- 
tion 8.16 implies that for large m the tail of M“”) is heavier in model 1 than in 
model 2. 


Example 8.17. Consider the one-factor Bernoulli mixture model for k exchangeable 
groups defined by (8.37). In this case equation (8.41) becomes 


k 
im Y mhu, +0) = BON) 
ER N r=1 
for some function p, which is fulfilled if the proportions of obligors in each 
group, m\” /m, converge to fixed constants A, as m — oo. Assuming unit expo- 
sures and 100% losses given default, our asymptotic conditional loss function is 
lw) = p(w) = ys Arh(ur + ow). Since W is assumed to have a standard nor- 
mal distribution, (8.42) implies, for large m, that 


k 
qa(L™) ~ mS Ahlu; +0 ®~'(a)). (8.43) 


r=l 
8.4.4 Threshold Models as Mixture Models 


Although the mixture models of this section seem, at first glance, to be different in 
structure to the threshold models of Section 8.3, it is important to realize that the 
majority of useful threshold models, including all the examples we have given, can 
be represented as Bernoulli mixture models. This is a very useful insight, because 
the Bernoulli mixture format has a number of advantages over the threshold format. 
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e Bernoulli mixture models lend themselves to Monte Carlo risk studies. From 
the analyses of this section we obtain methods for sampling from many of 
the models we have discussed, such as the ¢ copula threshold model used in 
Section 8.3.5. 


Mixture models are arguably more convenient for statistical fitting purposes. 


We show in Section 8.6.3 that statistical techniques for generalized linear 
mixed models can be used to fit mixture models to empirical default data 
gathered over several time periods. 


The large-portfolio behaviour of Bernoulli mixtures can be understood in 
terms of the behaviour of the distribution of the common economic factors, 
as was shown in Section 8.4.3. 


The following condition ensures that a threshold model can be written as a 
Bernoulli mixture model. 


Definition 8.18. A random vector X has a p-dimensional conditional independence 
structure with conditioning variable W if there is some p < m and a p-dimensional 
random vector W = (Wj,..., Wp)’ such that, conditional on W, the rvs X1,..., Xm 
are independent. 


Lemma 8.19. Let (X, D) be a threshold model for an m-dimensional random 
vector X. If X has a p-dimensional conditional independence structure with con- 
ditioning variable W, then the default indicators Y; = I,x;<a,,; follow a Bernoulli 
mixture model with factor Ý, where the conditional default probabilities are given 
by pi(W) = P(X < di |W = 4). 


Proof. For y € {0,1} define the set B := {1 < i < m : y; = 1} and let 
BS = {1,...,m}\B. We have 


PY=y| v=W=P(Nx < da} ( {Xi > dii} | v=¥) 
icB iE Be 
=|] Pi <di (Yah [[d-P& dit = y). 
ieB ie BS 
Hence, conditional on W = yw, the Y; are independent Bernoulli variables with 
success probability p;(w) := P(X; < di, |W = WẸ). 


Application to normal mixtures with factor structure. Suppose that the critical 
variables X = (X1,..., Xm) have a normal mean-variance mixture distribution as 
in Example 8.8 so that X = m(W) + WZ for W independent of Z. Suppose also 
that Z (and hence X) follows the linear factor model (8.25), so that Z = BF + e 
for a random vector F ~ Np (0, 2), a loading matrix B € R”*P, and independent, 
normally distributed rvs £1, .. . , €m, which are also independent of F. Then X has 
a (p + 1)-dimensional conditional independence structure. 

To see this, define the random vector W = (F3, ..., Fp, W)’ and observe that, 
conditional on W = y, X is Nn(m(w) + ./wB f, wY)-distributed, where Y is the 
(diagonal) covariance matrix of €. Since the covariance structure is diagonal, the 
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rvs X; are conditionally independent. For a threshold model (X, D), the equivalent 
Bernoulli mixture model is now easy to compute. The conditional default probabil- 
ities are 

di, — mj(w) — /wh' f 


WU; 


Di(W) = P(X S dii v= w=o( ), (8.44) 
where m; (w) is the ith component of m(w), b; is the ith row of B, and v; is the ith 
diagonal element of Y. 


Example 8.20 (threshold model of KMV/CreditMetrics type). Consider the spe- 
cial case of Gaussian critical variables where X = Z and W = F. If we standardize 
the critical variables X1, ..., Xn to have variance one and reparametrize the for- 
mula in terms of the individual default probabilities p; and the systematic variance 
component 6; = b 2b; = | — v; (see Example 8.6), we can infer from (8.44) that 


D- (Pi) - ar 

V1— Bi 
By comparison with Example 8.12 we see that the individual stochastic default 
probabilities p;(W) have a probit-normal distribution with parameters u; and o; 
given by 


pi(W) = o( ea) 


mi = P7! (P)/V1 -pi and of = B;/(1 — Bi). 
Example 8.21 (threshold model with Student ¢ copula). Consider the special 
case of multivariate ¢-distributed critical variables where X = /WZ and W ~ 
Ig(5¥, $v). Suppose that the margins of X1, ..., Xm have been standardized to be 
standard univariate t with v degrees of freedom. Again writing £; for the proportion 


of the variance of the critical variable X; explained by the factors F, we infer 
from (8.44) that 


ttp WT! — BF 
y (Pi) i | (8.46) 


V1— Bi 

The formula (8.44) is the key to Monte Carlo simulation for threshold models 
when the critical variables have a normal mixture distribution, particularly in a large- 
portfolio context. For example, rather than simulating an m-dimensional t distribu- 
tion to implement the t model, one only needs to simulate a p-dimensional normal 
vector F with p « m and an independent gamma-distributed variate V = W7!. 
In the second step of the simulation one simply conducts a series of independent 
Bernoulli experiments with default probabilities p;(W) to decide whether individual 
companies default. 


PW) = o( 


Application to Archimedean copula models. Another class of threshold models 
with an equivalent mixture representation is provided by models where the criti- 
cal variables have an exchangeable LT-Archimedean copula in the sense of Def- 
inition 5.47. Consider a threshold model (X, D), where X has an exchangeable 
LT-Archimedean copula C with generator ¢ such that @~! is the Laplace transform 
of some df G on [0, 00) with G(0) = 0. Let d = (di1, ..., dm)’ denote the first 
column of D containing the default thresholds and write (X,d) for a threshold 
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model of default with Archimedean copula dependence. Write pj = P(X; <S dj1) 
as usual and p = (p1,..., Pm)’ for the vector of default probabilities. 

Consider now a non-negative rv W ~ Gandrvs Uj, ..., Um that are conditionally 
independent given W with conditional distribution function P(U; < u | W = 
y) = exp(— Yg (u)) for u € [0, 1]. Then Proposition 5.46 shows that U has df C. 
Moreover, by Lemma 8.5, (X, d) and (U, p) are two equivalent threshold models 
for default. By construction U has a one-dimensional conditional independence 
structure with conditioning variable W and the conditional default probabilities are 
given by 

pit) = PU; < pi | ¥ = Y) = exp Yo (Pi)). (8.47) 


In order to simulate from a threshold model based on an LT-Archimedean copula 
we may therefore use the following efficient and simple approach. In a first step 
we simulate a realization y of W and then we conduct m independent Bernoulli 
experiments with default probabilities p;(w) as in (8.47) to simulate a realization 
of defaulting counterparties. 


Example 8.22 (the Clayton copula). As an example consider the Clayton cop- 
ula with generator #(t) = t7? — 1. Suppose we wish to construct an exchange- 
able Bernoulli mixture model with default probability 2 and joint default prob- 
ability 7x2 that is equivalent to a threshold model driven by the Clayton copula. 
As mentioned in Algorithm 5.48, an rv W ~ Ga(1/0, 1) (see Section A.2.4) has 
Laplace transform equal to the generator inverse o7! (t) = (t + 1)7!/®, so the mix- 
ing variable of the equivalent Bernoulli mixture model can be defined by setting 
Q = exp(-¥ (1° — 1)). 

Using (8.23), the required value of 6 to give the desired joint default probabilities 
is the solution to the equation m2 = Cg(z,7) = (2x7? — 1)71?, 0 > O. It is 
easily seen that zz and, hence, the default correlation in our exchangeable Bernoulli 
mixture model are increasing in 6; for 0 — 0 we obtain independent defaults and 
for 6 — oo defaults become comonotonic and default correlation tends to one. 


8.4.5 Model-Theoretic Aspects of Basel II 


In this section we examine how the considerations of Sections 8.4.3 and 8.4.4 have 
influenced the new Basel II capital-adequacy framework, which was discussed in 
more general terms in Section 1.3. Under this framework a bank is required to hold 
8% of the so-called risk-weighted assets (RWA) of its credit portfolio as risk capital. 
The RWA of a portfolio is given by the sum of the RWA of the individual risks in 
the portfolio, i.e. RWAP®tolio — $>”; RWA;. The quantity RWA; reflects exposure 
size and riskiness of obligor i; it takes the form RWA; = wy e;, where w; is a risk 
weight and e; denotes exposure size. 

Banks may choose between two options for determining the risk weight wj, 
which must then be implemented for the entire portfolio. Under the simpler stand- 
ardized approach, the risk weight w; is determined by the type (sovereign, bank 
or corporation) and the credit rating of counterparty i. For instance, w; = 50% for 
a corporation with a Moody’s rating in the range of A+ to A—. Under the more 
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advanced internal-ratings-based (IRB) approach, the risk weight takes the form 


P- (Pi) + ype 0 

VI-p l 
Here c is a technical adjustment factor that is of minor interest to us, p; represents the 
marginal default probability, and ô; is the percentage loss given default of obligor i. 
The parameter p € (0.12, 0.24) can be viewed as an asset correlation, as will be 
explained below. Estimates for p; and (under the so-called advanced IRB approach) 
for 6; and e; are provided by the individual bank; the adjustment factor c and, most 
importantly, the value of p are determined by fixed rules within the Basel II Accord 
independently of the structure of the specific portfolio under consideration. The risk 
capital to be held for counterparty i is thus given by 


D'(pi) + ann 

vīi=p l 
The interesting part of equation (8.49) is, of course, the expression involving the 
standard normal df, and we now give a derivation. Consider a one-factor threshold 
model of KMV/CreditMetrics type with marginal default probabilities p1, ..., Pm 
and critical variables given by 


Xi = J/pF+V/1- psi (8.50) 


for iid standard normal rvs F, €1,...,&m. It follows from Example 8.20 that 
an equivalent Bernoulli mixture model can be constructed by setting W = —F 
(the sign change facilitates the derivation) and conditional default probabilities 
pi(W) = P(T! (B) + /pW)/./T — p). Assume, moreover, that the loss given 
default of the firms is deterministic and equal to 6;e; and that exposures are rela- 
tively homogeneous. According to Proposition 8.16, under these assumptions the 
quantiles of the portfolio loss L = $`}; 5;e;Y; satisfy, for m large, the asymptotic 
relation 


wi = 0.08e3 ( (8.48) 


RC; = 0.08RWA; = cies ( (8.49) 


oe se) 
JVl—-— i 


For c = 1, the risk capital RC; in (8.49) can thus be considered as the asymptotic 
contribution of risk i to the 99.9% VaR of the overall portfolio in a one-factor 
Gaussian threshold model with asset correlation p. 

While formula (8.48) is influenced by portfolio-theoretic considerations, the new 
Basel II framework falls short of reflecting the true dependence structure of a bank’s 
credit portfolio for a number of reasons: first, in the Basel II framework the cor- 
relation parameter p is specified ad hoc by regulatory rules irrespective of “true” 
asset correlations; second, the simple one-factor model (8.50) is typically an over- 
simplified representation of the factor structure underlying default dependence, par- 
ticularly for internationally active banks; third, the rule is based on an asymptotic 
result. Moreover, historical default experience for the portfolio under consideration 
has no formal role to play in setting capital-adequacy standards. For these reasons 


m m 
qa lL) © È õiei pi (qa (Y)) = Laaa 
i=1 i=l 
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Figure 8.3. Tail of the mixing distribution of Q in four different exchangeable Bernoulli- 
mixture models: beta; probit-normal (one-factor KMV/CreditMetrics); logit-normal (Credit- 
PortfolioView); Clayton. In all cases the first two moments have the values 7 = 0.049 and 
x2 = 0.003 13, which correspond roughly to Standard & Poor’s rating category B; the actual 
parameter values can be found in Table 8.6. The horizontal line at 1077 shows that the models 
only really start to differ around the 99th percentile of the mixing distribution. 


the IRB approach is heavily debated in the risk-management community, and it is 
widely expected that, with improved availability of credit loss data, in the long run 
regulators will permit the use of internal portfolio models for capital-adequacy pur- 
poses for credit risk, as was allowed for market risk in the 1996 Amendment of the 
first Basel Accord (see Section 1.2.2). 


8.4.6 Model Risk Issues 


In this section, which is complementary to Section 8.3.5, we look briefly at an 
aspect of model risk for Bernoulli mixture models. We consider an exchangeable 
Bernoulli mixture model for a homogeneous portfolio and investigate the risk related 
to the choice of mixing distribution under the constraint that the default probabil- 
ity x and the default correlation py (or equivalently m and 72) are known and 
fixed. 

According to Proposition 8.16, the tail of M is essentially determined by the tail 
of the mixing variable Q. In Figure 8.3 we plot the tail function of the probit-normal 
distribution (corresponding to a one-factor KMV/CreditMetrics model), the logit- 
normal distribution (corresponding to CreditPortfolioView), the beta distribution 
(close to CreditRisk++-) and the mixture distribution (corresponding to the Clayton 
copula; see Example 8.22). The plots are shown on a logarithmic scale and in all 
cases the first two moments have the values 7 = 0.049 and m2 = 0.003 13, which 
correspond roughly to Standard & Poor’s rating category B; the parameter values 
for each of the models can be found in Table 8.6. 

Inspection of Figure 8.3 shows that the tail functions differ significantly only after 
the 99% quantile, the logit-normal distribution being the one with the heaviest tail. 
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Table 8.6. Parameter values for various exchangeable Bernoulli mixture models with iden- 
tical values of x and x2 (and py). The values of z and 2 correspond roughly to Standard & 
Poor’s ratings CCC, B and BB (in fact, they have been estimated from 20 years of Standard 
& Poor’s default data using the simple moment estimator in (8.61)). This table is used in the 
model-risk study of Section 8.4.6 and the simulation study of Section 8.6.2. 


Model Parameter CCC B BB 
All models x 0.188 0.049 0.0112 
T2 0.042 0.003 13 0.000 197 
py 0.0446 0.0157 0.006 43 
Beta a 4.02 3.08 1.73 
b 17.4 59.8 153 
Probit-normal u —0.93 —1.71 —2.37 
o 0.316 0.264 0.272 
Logit-normal u —1.56 —3.1 —4.71 
o 0.553 0.556 0.691 
Clayton x 0.188 0.049 0.0112 
0 0.0704 0.032 0.0247 


From a practical point of view this means that the particular parametric form of the 
mixing distribution in a Bernoulli mixture model is of minor importance once 7 
and py have been fixed. Of course this does not mean that Bernoulli mixtures are 
immune to model risk; the tail of M is quite sensitive to x and in particular to py, 
and these parameters are not easily estimated (see Section 8.6.4 for a discussion of 
statistical inference for mixture models). 


Systematic recovery risk. Another important source of model risk in credit risk 
management models is the modelling of loss given default or equivalently of the 
recovery rates. In standard portfolio risk models it is assumed that the loss given 
default is independent of the default event. However, one expects the loss given 
default to depend on the same risk factors as default probabilities; in that case 
we speak of systematic recovery risk. The presence of systematic recovery risk is 
confirmed in a number of empirical studies. In particular, Frye (2000) has carried 
out a formal empirical analysis using recovery data collected by Moody’s on rated 
corporate bonds. He found that recovery rates are substantially lower than average 
in times of economic recession. To quote from his paper: 


Using that data [the Moody’s data] to estimate an appropriate credit 
model, we can extrapolate that in a severe economic downturn recover- 
ies might decline 20-25 percentage points from the normal-year aver- 
age. This could cause loss given default to increase by nearly 100% and 
to have a similar effect on economic capital. Such systematic recovery 
risk is absent from first-generation credit risk models. Therefore these 
models may significantly understate the capital required at banking 
institutions. 
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Clearly, this calls for the inclusion of systematic recovery risk in standard credit risk 
models. The challenge is not in building models allowing for systematic recovery 
risk—this can be accomplished easily in the framework of Section 8.4.3—but in 
estimating the dependence of the loss given default ô; on the economic factors. At 
present there are few empirical studies dealing with recovery risk; a good survey is 
Schuermann (2003). 


Notes and Comments 


The logit-normal mixture model can be thought of as a one-factor version of the 
CreditPortfolio View model of Wilson (1997a,b). Details of this model can be found 
in Section 5 of Crouhy, Galai and Mark (2000). Further details of the beta binomial 
distribution can be found in Joe (1997). 

The rating agency Moody’s uses a so-called binomial expansion technique to 
model default dependence in a simplistic way. The method, which is very popular 
with practitioners, is not based on a formal default risk model, but is related to 
binomial distributions. The basic idea is to approximate a portfolio of m dependent 
counterparties by a homogeneous portfolio of d < m independent counterparties 
with adjusted exposures and identical default probabilities; the index d is called the 
diversity score and is chosen according to rules defined by Moody’s. For further 
information we refer to Davis and Lo (2001) and Section 9.2.7 of Lando (2004). 

A comprehensive description of CreditRisk-+ is given in the original manual for 
CreditRisk+ (Credit Suisse Financial Products 1997). An excellent discussion of 
the model structure from a more academic viewpoint is provided in Gordy (2000). 
Both sources also provide further information on the calibration of the factor vari- 
ances o; and factor weights w;;. The derivation of recursion formulas for the prob- 
abilities P(M =k), k = 0,1,..., via Panjer recursion is given in Appendix A10 
of CreditRisk+ (Credit Suisse Financial Products 1997). In Gordy (2002) an alter- 
native approach to the computation of the loss distribution in CreditRisk+ is 
proposed—one which uses the saddle-point approximation (see, for instance, Jensen 
1995). Further numerical work for CreditRisk+ can be found in papers by Kurth 
and Tasche (2003), Glasserman (2003b) and Haaf, Reiss and Schoenmakers (2004). 
Importance-sampling techniques for CreditRisk+ are discussed in Glasserman and 
Li (2003b). In Frey and McNeil (2002) it is shown that the Bernoulli mixture model 
corresponding to a one-factor exchangeable version of CreditRisk+ is very close to 
an exchangeable Bernoulli mixture model with beta mixing distribution. 

The results in Section 8.4.3 are taken from Frey and McNeil (2003); related results 
have been derived by Gordy (2001). The first limit result for large portfolios was 
obtained in Vasicek (1997) for a probit-normal mixture model equivalent to the 
KMV model. Asymptotic results for credit portfolios related to the theory of large 
deviations are discussed in Dembo, Deuschel and Duffie (2004). 

The equivalence between threshold models and mixture models has been observed 
by Koyluoglu and Hickman (1998) and Gordy (2000) for the special case of Credit- 
Metrics and CreditRisk+. Applications of Proposition 5.46 to credit risk modelling 
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are also discussed in Schonbucher (2002). It is of course possible to develop multi- 
state mixture models to describe credit migrations as well as defaults and to derive 
equivalence results between multi-state latent threshold models and multi-state mix- 
ture models; see, for instance, Section 4.4 of Frey and McNeil (2001) for an appli- 
cation to credit risk and Joe (1997) for mathematical background information. The 
study of mixture representations for sequences of exchangeable Bernoulli rvs is 
related to a well-known result of de Finetti, which states that any infinite sequence 
Yı, Y2,... of exchangeable Bernoulli rvs has a representation as an exchangeable 
Bernoulli mixture; see, for instance, Theorem 35.10 in Billingsley (1995) for a 
precise statement. Hence any exchangeable model for Y that can be extended to 
arbitrary portfolio size m has a representation as an exchangeable Bernoulli mixture 
model. 

For details of the IRB approach, and the Basel II Capital Accord in general, we 
refer to the website of the Basel Committee: www.bis.org/bcbs. Our discussion in 
Section 8.4.5 is related to the analysis by Gordy (2001). 


8.5 Monte Carlo Methods 


In this section we consider a Bernoulli mixture model for a loan portfolio and assume 
that the overall loss is of the form L = }~_, L;, where the L; are conditionally inde- 
pendent given some economic factor vector W. A possible method for calculating 
risk measures and related quantities such as capital allocations is to use Monte Carlo 
(MC) simulation, although the problem of rare-event simulation arises. Suppose, 
for example, that we wish to compute expected shortfall and expected shortfall 
contributions at the confidence level a for our portfolio. We need to evaluate the 
conditional expectations 


E(L| L> qa(L)) and E(Li | L > qa(L)). (8.51) 


Ifa = 0.99, say, then only 1% of our standard Monte Carlo draws will lead to a port- 
folio loss higher than go,99(L). The standard MC estimator of (8.51), which consists 
of averaging the simulated values of L or L; over all draws leading to a simulated 
portfolio loss L > qa (L), will be unstable and subject to high variability, unless the 
number of simulations is very large. The problem is of course that most simulations 
are “wasted”, in that they lead to a value of L which is smaller than gg (L). Fortu- 
nately, there exists a variance-reduction technique known as importance sampling 
(IS), which is well suited to such problems. 


8.5.1 Basics of Importance Sampling 


Consider an rv X on some probability space (2, F, P) and assume that it has an 
absolutely continuous df with density f. A generalization to general probability 
spaces is discussed below. The problem we consider is the computation of the 
expected value 


6 = E(h(X)) = T h(x) f (x) dx (8.52) 
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for some known function h. To calculate the probability of an event we consider 
a function of the form h(x) = Ixea} for some set A C R; for expected shortfall 
computation we consider functions of the form h(x) = xJt,>c} for some c € R. 
Where the analytical evaluation of (8.52) is difficult, due to the complexity of the 
distribution of X, we can resort to an MC approach, for which we only have to be 
able to simulate variates from the distribution with density f. 


Algorithm 8.23 (Monte Carlo integration). 


(1) Generate Xj, ..., Xn independently from density f. 
(2) Compute the standard MC estimate ÔMC = (1/n) )7"_, h(X;). 


The MC estimator converges to 0 by the strong law of large numbers, but the 
speed of convergence may not be particularly fast, particularly when we are dealing 
with rare-event simulation. 

Importance sampling is based on an alternative representation of the integral 
in (8.52). Consider a second probability density g (whose support should contain 
that of f) and define the likelihood ratio r(x) by r(x) := f(x)/g(x) whenever 
g(x) > 0, and r(x) = 0 otherwise. The integral (8.52) may be written in terms of 
the likelihood ratio as 


6= ite h(x)r(x)g(x) dx = Eg(h(X)r(X)), (8.53) 


where E, denotes expectation with respect to the density g. Hence we can approx- 
imate the integral with the following algorithm. 


Algorithm 8.24 (importance sampling). 


(1) Generate X;,..., Xn independently from density g. 
(2) Compute the IS estimate 6/8 = (1/n) X$; h(X;) r(Xi). 


The density g is often termed the importance-sampling density. The art (or sci- 
ence) of importance sampling is in choosing an importance-sampling density such 
that, for fixed n, the variance of the IS estimator is considerably smaller than that of 
the standard Monte Carlo estimator. In this way we can hope to obtain a prescribed 
accuracy in evaluating the integral of interest using far fewer random draws than are 
required in standard Monte Carlo simulation. The variances of the estimators are 
given by 


varg (ô) = (1/n)(Eg(h(X)*r(X)) — 67), 
var(@MC) = (1/n)(E(h(X)*) — 67), 


so that the aim is to make E,(h(X)?r(X)?) small compared with E(h(X)?). In 
theory, the variance of ĝ!S can be reduced to zero by choosing an optimal g. To see 
this, suppose for the moment that h is non-negative and set 


g(x) = FAE AX). (8.54) 
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With this choice, the likelihood ratio becomes r(x) = E(h(X))/h(x). Hence 
gis = h(X1)r(X1) = E(h(X)), and the IS estimator gives the correct answer in 
a single draw. In practice, it is of course impossible to choose an IS density of the 
form (8.54), as this requires knowledge of the quantity E(h(X)) that one wants to 
compute; nonetheless, (8.54) can provide useful guidance in choosing an IS density, 
as we will see in the next section. 

Consider the case of estimating a rare-event probability corresponding to 
h(x) = Iy>c} for c significantly larger than the mean of X. Then we have that 
E(h(X)*) = P(X > c) and, using (8.53), that 


E,(h(X)*r(X)") = E,(r(X)?; X Bc) = E(r(X); X 2 ©). (8.55) 


Clearly, we should try to choose g such that the likelihood ratio r(x) = f(x)/g(x) 
is small for x > c; in other words, we should make the event {X > c} more likely 
under the IS density g than it is under the original density f. 


Exponential tilting. We now describe a useful way of finding IS densities when 
X is light tailed. For t € R we write My(t) = E(e'*) = f°. e™ f(x) dx for the 
moment-generating function of X, which we assume is finite for t € R. If Mx (t) is 
finite, we can define an IS density by g;(x) := e’* f (x)/Mx (t). The likelihood ratio 
is r(x) = f(x)/g:(x) = Myx (t)e™. Define ur to be the mean of X with respect 


to the density gz, i.e. 
iu = Eg, (X) = E(X exp(tX))/Mx(0). (8.56) 


How can we choose t optimally for a particular importance-sampling problem? 
We consider the case of tail probability estimation and recall from (8.55) that the 
objective is to make 


E(r(X), X Bo= E(I{x>ey)Mx (te) (8.57) 
small. Now observe that e~* < e~“ for x > c and t > 0, so 
E(Ix>qMx(e"*) < Mx (te. 


Instead of solving the (difficult) problem of minimizing (8.57) over t, we choose 
t so that this bound becomes minimal. Equivalently, we try to find ¢ minimizing 
In Mx (t) — tc. Using (8.56) we obtain that 


d E E(X exp(tX)) 
a Mx(t) —tce= Mx) 


which suggests choosing t = t (c) as the solution of the equation u; = c, so that the 
rare event {X > c} becomes a normal event if we compute probabilities using the 
density g;(c). A unique solution of the equation us = c exists for all relevant values 
of c. In the cases that are of interest to us this is immediately obvious from the form 
of the exponentially tilted distributions, so we omit a formal proof. 


c= HUC, 


Example 8.25 (exponential tilting for normal distribution). We illustrate the 
concept of exponential tilting in the simple case of a standard normal rv. Suppose 
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that X ~ N(O, 1) with density (x). Using exponential tilting we obtain the new 
density g;(x) = exp(tx)@(x)/Mx(t). The moment-generating function of X is 
known to be Mx (t) = exp(5?’). Hence 


exp(tx — x(t +x7)) = exp(4(x —1)’), 


1 1 
81 (x) = = 
; V20 V2 
so that, under the tilted distribution, X ~ N (t, 1). Note that in this case exponential 
tilting corresponds to changing the mean of X. 


An abstract view of importance sampling. To handle the more complex application 
to portfolio credit risk in the next section it helps to consider importance sampling 
from a slightly more general viewpoint. Given densities f and g as above, define 
probability measures P and Q by 


PA= | foods and OA) = f gads, ACR. 
A A 


With this notation, (8.53) becomes 0 = E” (h(X)) = E2(h(X)r(X)), so that r (X) 
equals dP/dQ, the (measure-theoretic) density of P with respect to Q. Using this 
more abstract view, exponential tilting can be applied in more general situations: 
given an rv X on (2, F, P) such that My(t) = E” (exp(tX)) < o, define the 
measure Q; on (2, F) by 


dQ; B exp(t X) , pP 
JP = Mx) š 1.€. Q;(A) =E ( 


exp(tX) | a) 
Mx(t)’ J’ 


and note that (d Q,/dP)~! = My(t) exp(—tX) = r;(X). The IS algorithm remains 
essentially unchanged: simulate independent realizations X; under the measure Q; 
and set ÊS = (1/n) yr Xirr(X;) as before. 


8.5.2 Application to Bernoulli-Mixture Models 


In this section we return to the subject of credit losses and consider a portfolio loss 
of the form L = Yai ei Yi, where the e; are deterministic, positive exposures and 
the Y; are default indicators with default probabilities p;. We assume that Y follows 
a Bernoulli mixture model in the sense of Definition 8.10 with factor vector W 
and conditional default probabilities p; (W). We study the problem of estimating 
exceedance probabilities 0 = P(L > c) for c substantially larger than E (L) using 
importance sampling. This is useful for risk-management purposes, as, for c ~ 
da(L), a good importance-sampling distribution for the computation of P(L > c) 
also yields a substantial variance reduction for computing expected shortfall or 
expected shortfall contributions. 

We consider first the situation where the default indicators Yj, ..., Ym are inde- 
pendent and discuss subsequently the extension to the case of conditionally inde- 
pendent default indicators. Our exposition is based on Glasserman and Li (2003a). 
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Independent default indicators. Here we use the more general IS approach out- 
lined at the end of the previous section. Set 2 = {0, 1}’”, the state space of Y. The 
probability measure P is given by 


Pay) = | [570 - pD, ye (0,1)". 


i=l 


We need to understand how this measure changes under exponential tilting using L. 
The moment-generating function of L is easily calculated to be 


M(t) = (exp (: yan) = in E(el@%) = [lea +1 — pj). 


i=1 i=1 i=l 
The measure Q; is given by Q;({y}) = EP (el! /M_L(t); Y = y) and hence 


m 


exp(t ) int ei Yi) exp(te; yi) 
C E p = 
mO Daa 


i= 


On({y}) = By (Spy). 8 
Define new default probabilities by q; ; := exp(te;) pi /(exp(te;) pi + 1 — p;).Itfol- 
lows that Q: (y) = [ [i Gi —4,i)'~’, so that after exponential tilting the 
default indicators remain independent but with new default probability g; ;. Note 
that g;,; tends to one for t — oo and to zero for t — —o, so that we can shift the 
mean of L to any point in (0, oil ei). 

In analogy with our previous discussion, for IS purposes, the optimal value of t 
is chosen such that E 2 (L) = c, leading to the equation ee eiqt i = C. 


Conditionally independent default indicators. The first step in the extension of 
the importance-sampling approach to conditionally independent defaults is obvious: 
given arealization w of the economic factors, the conditional exceedance probability 
O(v) := P(L > c | W = ) is estimated using the approach for independent 
default indicators described above. We have the following algorithm. 


Algorithm 8.26 (IS for conditional loss distribution). 
(1) Given y, calculate the conditional default probabilities p;(y) according to 
the particular model, and solve the equation 


m 


3 : exp(te;) pi (Y) re 
"exp(te;) pi(w) + 1 — pi) 


i=1 
the solution t = t (c, W) gives the optimal degree of tilting. 


(2) Generate nı conditional realizations of the default vector (Y1,..., Ym). The 
defaults of the companies are simulated independently, with the default prob- 
ability of the ith company given by 


exp(t(c, pei) pi(W) 
exp(t(c, Wei) pi(W) + 1 — pih) 
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(3) Denote by Mz (t, Y) := [ [jy {exp(tei) pi(w) + 1 — pi(W)} the conditional 
moment-generating function of L. From the simulated default data construct 
nı conditional realizations of L = Yii ei Y; and label these LY, ve LOD: 
Determine the IS estimator for the conditional loss distribution: 

ny 


x 1 : 
Oh) = ML. Y), y Yo hroza expte, WL). 
j=l 


In principle, the approach discussed above also applies in the more general situ- 
ation where the loss given default is random; all we need to assume is that the L; 
are conditionally independent given W, as in Assumption (A1) of Section 8.4.3. 
However, the actual implementation can become quite involved. 


IS for the distribution of the factor variables. Suppose we now want to estimate the 
unconditional probability © = P(L > c). A naive approach would be to generate 
realizations of the factor vector W and to estimate 0 by averaging the IS estimator of 
Algorithm 8.26 over these realizations. As is shown in Glasserman and Li (2003a), 
this is not the best solution for large portfolios of dependent credit risks. Intuitively, 
this is due to the fact that for such portfolios most of the variation in L is caused by 
fluctuations of the economic factors, and we have not yet applied IS to the distribution 
of W. For this reason we now discuss a full IS algorithm that combines IS for the 
economic factor variables with Algorithm 8.26. 

We consider the important case of a Bernoulli mixture model with multivariate 
Gaussian factors and conditional default probabilities p;(W) for W ~ N,(0, 2), 
such as the probit-normal Bernoulli mixture model described by (8.45). In this con- 
text it is natural to choose an importance-sampling density such that W ~ Np (u, 2) 
for a new mean vector u € R?, i.e. we take g as the density of Np (u, 2). For a 
good choice of u we expect to generate realizations of W leading to high conditional 
default probabilities more frequently. The corresponding likelihood ratio ry (W) is 
given by the ratio of the respective multivariate normal densities, so that 


exp(—5W/Q7!w) 
exp(—3(W — w)'2-1(W — w) 
Essentially, this is a multivariate analogue of the exponential tilting applied to a 


univariate normal distribution in Example 8.25. 
Now we can describe the algorithm for full IS. At the outset we have to choose 


ry) = = exp(—p’Q7'W + iu 2p). 


the overall number of simulation rounds, n, the number of repetitions of conditional 
IS per simulation round, n1, and the mean of the IS distribution for the factors, m. 
Whereas the value of n depends on the desired degree of precision and is best 
determined in a simulation study, nı should be taken to be fairly small. An approach 
to determine a sensible value of jt is discussed below. 


Algorithm 8.27 (full IS for mixture models with Gaussian factors). 


(1) Generate %1, ..., Wn ~ N(u, Ip). 
(2) For each W; calculate 6}°''(W;) as in Algorithm 8.26. 
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(3) Determine the full IS estimator: 


‘ (en A 
On = = D rD W). 


i=l 


Choosing p. A key point in the full IS approach is the determination of a good 
value for u, which leads to a low variance of the importance-sampling estima- 
tor. Here we sketch the solution proposed by Glasserman and Li (2003a). Since 
ôS!) ~ P(L 2 c| W¥ = Ņ), applying IS to the factors essentially amounts to 
finding a good importance-sampling density for the function y > P(L >c | 
W = w). Now recall from our discussion in the previous section that the optimal IS 
density g* satisfies 


(hw) x P(L>c|W = pexp(—sw'2'p), (8.58) 


“oc? standing for “proportional to”. Sampling from that density is obviously not 
feasible, as the normalizing constant involves the exceedance probability P(L > c) 
that we are interested in. In this situation the authors suggest using a multivariate 
normal density with the same mode as g* as an approximation to the optimal IS 
density. Since a normal density attains its mode at the mean m, this amounts to 
choosing pm as the solution to the optimization problem 


max P(L>c|W = wpyexp(—3W'2 |p). (8.59) 


An exact (numerical) solution of (8.59) is difficult because the function P(L > c | 
wW = W) is usually not available in closed form. Glasserman and Li (2003a) discuss 
several approaches to overcoming this difficulty; see their paper for details. 


Notes and Comments 


Our discussion of IS for credit portfolios follows Glasserman and Li (2003a) closely. 
Theoretical results on the asymptotics of the IS estimator for large portfolios and 
numerical case studies contained in Glasserman and Li (2003a) indicate that full 
IS is a very useful tool for dealing with large Bernoulli mixture models. Merino 
and Nyfeler (2003) and Kalkbrener, Lotter and Overbeck (2004) undertook related 
work—the latter paper gives an interesting alternative solution to finding a reason- 
able IS mean p for the factors. 

For a general introduction to importance sampling we refer to the excellent text- 
book by Glasserman (2003a) (see also Robert and Casella 1999). For applications 
of importance sampling to heavy-tailed distributions, where exponential families 
cannot be applied directly, see Asmussen, Binswanger and Hgjgaard (2000) and 
Glasserman, Heidelberger and Shahabuddin (1999). 

As an alternative to simulation one can try to determine analytic approximations 
for the loss distributions. Applications of the saddle-point approximation (see Jensen 
1995) are discussed in Martin, Thompson and Browne (2001) and Gordy (2002). 
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8.6 Statistical Inference for Mixture Models 


In this section we consider the statistical estimation of model parameters from 
historical default data for the kind of mixture models described in Section 8.4. This 
is quite a specific issue in the general area of statistical inference for credit risk 
models and the reader seeking more general literature should consult Notes and 
Comments. Before turning to statistical methods we provide a word of motivation 
for the approach we take in this section. 


8.6.1 Motivation 


The calibration of portfolio credit risk models used in industry (such as KMV, 
CreditMetrics or CreditRisk+) has, in general, not relied on the formal statistical 
estimation of model parameters from historical default and migration data. There 
are good reasons for this, the main one being that, particularly for higher-rated 
companies, there are simply not enough relevant data on historical defaults to obtain 
reliable parameter estimates by formal inference alone. 

Industry approaches generally separate the problems of estimating (i) default 
probabilities and (ii) additional model parameters describing the dependence of 
defaults. The default probability of an individual company is usually estimated by 
an appropriate historical default rate for “similar companies”, where the similarity 
metric may be based on a credit-rating system (CreditMetrics) or a proprietary 
measure like distance-to-default (DD) in the case of KMV. 

It is in the determination of other model parameters that current industry models 
are much less “formally statistical”. While most industry models postulate plausi- 
ble factor-model structures for the mechanism generating default dependence, the 
parameters of these factor models are very often either simply “assigned” by eco- 
nomic arguments or determined by auxiliary factor analyses of proxy variables. To 
give an example of the latter, some threshold models that equate the critical variable 
with a change in asset value (in the style of Merton’s model) calibrate the factor 
model by taking equity returns as a proxy for asset-value changes and fitting a factor 
model to equity returns. 

The ad hoc nature of such approaches raises the question of how much confidence 
can be placed in the model parameters thus derived, and how much model risk 
remains? For example, in a model of KMV/CreditMetrics type, how confident are 
we that we have correctly determined the size of the systematic risk component (8.26) 
due to the factors? In Section 8.3.5 we showed that there is considerable model risk 
associated with the size of the specific risk component, particularly when the tail of 
a credit loss distribution is of central importance. 

In this final section of this chapter we describe methods for the pure statistical 
estimation of all model parameters from default data. Currently, such an approach is 
perhaps only feasible for lower-grade credit risks where historical databases contain 
sufficient material to estimate parameters relating to default probability as well as 
parameters relating to default dependence. This picture may change as data become 
more plentiful over the years. Moreover, someone who grasps the principles of 
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model fitting in this section will see that current industry approaches could even be 
combined with the approach of this section to yield a hybrid methodology. More 
explicitly, components of the factor-model structure could be based on external 
inputs from industry models, while key parameters, such as parameters governing 
the overall sensitivity to systematic effects, could be statistically estimated from 
historical data. 

The models we describe are motivated by the format of the data we consider, which 
can be described as repeated cross-sectional data. This kind of data, comprising 
observations of the default or non-default of groups of monitored companies in a 
number of time periods, is readily available from rating agencies. Since the group of 
companies may differ from period to period, as new companies are rated and others 
default or cease to be rated, we have a cross-section of companies in each period, but 
the cross-section may change from period to period. A different kind of data that we 
do not consider would be panel data or repeated-measures data (particularly panels 
of ratings) for individual companies that are actively followed over time. 

Our examples are relatively simple, but illustrate the main ideas. In Section 8.6.2 
we discuss the estimation of default probabilities and default correlations for homo- 
geneous groups, e.g. groups with the same credit rating. In Section 8.6.3 we consider 
more complicated one-factor models allowing more heterogeneity and make a link 
to the important class of generalized linear mixed models (GLMMs) used in many 
statistical applications; an example is given in Section 8.6.4. 


8.6.2 Exchangeable Bernoulli-Mixture Models 


Suppose that we observe historical default numbers over n periods of time for a 
homogeneous group; typically these might be yearly data. For t = 1,...,7, let 
m; denote the number of observed companies at the start of period t and let M, 
denote the number that defaulted during the period; the former will be treated as 
fixed at the outset of the period and the latter as an rv. Suppose further that within 
a time period these defaults are generated by an exchangeable Bernoulli mixture 
model of the kind described in Section 8.4.1. In other words, assume that, given 
some mixing variable Q, taking values in (0, 1) and the cohort size m;, the number 
of defaults M, is conditionally binomially distributed and satisfies M, | Q; = 
q ~ B(m;,, q). Further assume that the mixing variables Q),..., Qn are identically 
distributed. We consider two methods for estimating the fundamental parameters 
of the mixing distribution m = 71, m2 and py (default correlation); these are the 
method of moments and the maximum likelihood method. 


A simple moment estimator. Forl < t <n, let Y;1,..., Yim, be default indicators 
for the m; companies in the cohort. Suppose we define the rv 


M 
( a] = 5 Yne Yre (8.60) 
{i1 nik} C{1,... m1} 


this represents the number of possible subgroups of k obligors among the defaulting 
obligors in period t¢ (and takes the value zero when k > M,). By taking expectations 
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in (8.60) we get 


and hence 


s-e) 


We estimate the unknown theoretical moment 7z by taking a natural empirical 
average (8.61) constructed from the n years of data: 


à o1 x ean: n M,(M, — 1)---(M, —k +1) 
oe oe (8.61) 


t=1 (%) 7 =1 m,(m;, — 1) (m —k + 1) j 


For k = 1 we get the standard estimator of default probability 


and py can obviously be estimated by taking py = (2 — È?) /( — 77). The esti- 
mator is unbiased for zg and consistent as n — oo (for more details see Frey and 
McNeil (2001)). Note that, for Q, random, consistency requires observations for a 
large number of years; it is not sufficient to observe a large pool in a single year. 


Maximum likelihood estimators. To implement a maximum likelihood (ML) pro- 
cedure we assume a simple parametric form for the density of the Q, (such as beta, 
logit-normal or probit-normal). The joint probability function of the default counts 
M,,..., Mn given the cohort sizes m1, ..., mn can then be calculated using (8.33), 
under the assumption that the Q, variables in different years are independent. This 
expression is then maximized with respect to the natural parameters of the mixing 
distribution (i.e. a and b in the case of beta and u and o for the logit-normal and 
probit-normal). Of course, independence may be an unrealistic assumption for the 
mixing variables, due to the phenomenon of economic cycles, but the method could 
then be regarded as a quasi-maximum likelihood (QML) procedure, which misspec- 
ifies the serial dependence structure but correctly specifies the marginal distribution 
of defaults in each year and still gives reasonable parameter estimates. 

In practice, it is easiest to use the beta mixing distribution, since, in this case, 
given the group size m; in period t, the rv M, has a beta-binomial distribution with 
probability function given in (8.35). The likelihood to be maximized thus takes the 


form 
n 
M,,b — M, 
L(a, b; data) = I] (i) paved ) 
M; Ba, b) 


t=1 
and maximization can be performed numerically with respect to a and b. For further 
information about the ML method consult, as usual, Section A.3. The ML estimates 
of x = x1, 72 and py are calculated by evaluating moments of the fitted distribution 
using (8.34); the formulas are given in Example 8.13. 
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A comparison of moment estimation and ML estimation. To compare these two 
approaches we conduct a simulation study summarized in Table 8.7. To generate 
data in the simulation study we consider the beta, probit-normal and logit-normal 
mixture models of Section 8.4.1. In any single experiment we generate 20 years of 
data using parameter values that roughly correspond to one of the Standard & Poor’s 
credit ratings CCC, B or BB (see Table 8.6 for the parameter values). The number of 
firms m; in each of the years is generated randomly using a binomial-beta model to 
give a spread of values typical of real data; the defaults are then generated using one 
of the Bernoulli mixture models and estimates of x, 22 and py are calculated. The 
experiment is repeated 5000 times and a relative root mean square error (RRMSE) 
is estimated for each parameter and each method: that is, we take the square root of 
the estimated MSE and divide by the true parameter value. Methods are compared 
by calculating the percentage increase of the estimated RRMSE with respect to the 
better method (i.e. the RRMSE minimizing method) for each parameter. 

It may be concluded from Table 8.7 that the ML method is better in all but one 
experiment. Surprisingly, it is better even in the experiments when it is misspecified 
and the true mixing distribution is either probit-normal or logit-normal; in fact, 
in these cases, it offers more of an improvement than in the beta case. This can 
partly be explained by the fact that when we constrain well-behaved, unimodal 
mixing distributions with densities to have the same first and second moments, 
these distributions are very similar (see Figure 8.3). Finally, we observe that the 
ML method tends to outperform the moment method more as we increase the credit 
quality, so that defaults become rarer. 


8.6.3 Mixture Models as GLMMs 


A one-factor Bernoulli mixture model. Recall the simple one-factor model (8.36) 
generalizing the exchangeable model in Section 8.4.1. Rewriting slightly, this has 
the form 


pi(Y) = h(n + B'xi +), (8.62) 


where A is a link function, the vector x; contains covariates for the ith firm, such 
as indicators for group membership or key balance sheet ratios, and 6 and u are 
model parameters. Examples of link functions include the standard normal df ® (x) 
and the logistic df (1 + exp(—x))7!. The scale parameter o has been subsumed in 
the normally distributed random variable W ~ N(0, 07), representing a common 
or systematic factor. 

This model can be turned into a multiperiod model for default counts in different 
periods, by considering that a series of mixing variables Y1, ..., W generate default 
dependence in each time period t = 1, ..., n. The default indicator Y; ; for the ith 
company in time period ¢ is assumed to be Bernoulli with default probability p; i (Y%) 
depending on Y, according to 


Pri) = hlu + x; B+), (8.63) 
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Table 8.7. Each part of the table relates to a block of 5000 simulations using a particular 
exchangeable Bernoulli mixture model with parameter values roughly corresponding to a 
particular S&P rating class. For each parameter of interest, an estimated RRMSE is tabulated 
for both estimation methods: moment estimation using (8.61) and ML estimation based on the 
beta model. Methods can be compared by using A, the percentage increase of the estimated 
RRMSE with respect to the better method (i.e. the RRMSE minimizing method) for each 
parameter. Thus, for each parameter the better method has A = 0. The table clearly shows 
that MLE is better in all but one case. 


Moment MLE-beta 
———_—_ je 


Group True model Parameter RRMSE A RRMSE A 


CCC Beta T 0.101 0 0.101 0 
CCC Beta T2 0.202 0 0.201 0 
CCC Beta py 0.332 5 0.317 0 
CCC _ Probit-normal T 0.100 0 0.100 0 
CCC Probit-normal T2 0.205 1 0.204 0 
CCC Probit-normal py 0.347 11 0.314 0 
CCC Logit-normal T 0.101 0 0.101 0 
CCC Logit-normal T2 0.209 1 0.208 0 
CCC Logit-normal py 0.357 11 0.320 0 
B Beta T 0.130 0 0.130 0 
B Beta T2 0.270 0 0.269 0 
B Beta py 0.396 8 0.367 0 
B Probit-normal T 0.130 0 0.130 0 
B Probit-normal T2 0.286 3 0.277 0 
B Probit-normal py 0.434 19 0.364 0 
B Logit-normal T 0.131 0 0.132 0 
B Logit-normal T2 0.308 7 0.289 0 
B Logit-normal py 0.493 26 0.392 0 
BB Beta T 0.199 0 0.199 0 
BB Beta T2 0.435 0 0.438 1 
BB Beta py 0.508 7 0.476 0 
BB Probit-normal T 0.197 0 0.197 0 
BB Probit-normal T2 0.492 10 0.446 0 
BB Probit-normal py 0.607 27 0.480 0 
BB Logit-normal T 0.196 0 0.196 0 
BB Logit-normal T2 0.572 24 0.462 0 
BB Logit-normal py 0.752 45 0.517 0 


where W, ~ N(0, o°) and xX; i are covariates for the ith company in time period t. 


Moreover, the default indicators Y;,1,..., Y:,m, in period t are assumed to be con- 
ditionally independent given Yj. 
To complete the model we need to specify the joint distribution of W1, ..., Wn, 


and it is easiest to assume that these are iid mixing variables. To capture possible 
economic cycle effects causing dependence between numbers of defaults in succes- 
sive time periods one could either enter covariates at the level of x;; that are known 
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to be good proxies for “state of the economy”, such as changes in GDP over the time 
period, or an index like the Chicago Fed National Activity Index (CFNAI) in the US, 
or one could consider a serially dependent time series structure for the systematic 
factors (WY). 


A one-factor Poisson mixture model. When considering higher-grade portfolios 
of companies with relatively low default risk, there may sometimes be advantages 
(particularly in the stability of fitting procedures) in formulating Poisson mixture 
models instead of Bernoulli mixture models. A multiperiod mixture model based on 
Definition 8.11 can be constructed by assuming that the default count variable Ý, t,i for 
the ith company in time period ¢ is conditionally Poisson with rate parameter à; i (W+) 
depending on Y, according to 


r,i (%) = exp(u + x; B+ %), (8.64) 


with all other elements of the model as in (8.63). Again the variables ř, ills Ý, mı 
are assumed to be conditionally independent given ¥%. 


GLMMs. Both the multiperiod Bernoulli and Poisson mixture models in (8.63) 
and (8.64) belong to a family of widely used statistical models known as generalized 
linear mixed models (GLMMs). The three basic elements of such a model are as 
follows. 


(1) The vector of random effects. In our examples this is the vector (WY, ..., Wn) 
containing the systematic factors for each time period. 


(2) A distribution from the exponential family for the conditional distribution of 
the responses (Y; ; or Y, t,i) given the random effects. Responses are assumed to 
be conditionally independent given the random effects. The Bernoulli, bino- 
mial and Poisson distributions all belong to the exponential family (see, for 
example, McCullagh and Nelder 1989, p. 28). 


(3) A link function relating E(Y;,; | W), the mean response conditional on the 
random effects, to the so-called linear predictor. In our examples the linear 
predictor for Y;; is 

m iV) = w+ x) B+. (8.65) 


We have considered the so-called probit and logit link functions in the 
Bernoulli case and the log-link function in the Poisson case. (Note that it 
is usual in GLMMs to write the model as g(E(Y;,; | %)) = 11,i(%) and to 
refer to g as the link function; hence the probit link function is the quantile 
function of standard normal and the link in the Poisson case (8.64) is referred 
to as “log” rather than “exponential”’.) 


When no random effects are modelled in a GLMM, the model is simply known as a 
generalized linear model or GLM. The role of the random effects in the GLMM is, 
in a sense, to capture patterns of variability in the responses that cannot be explained 
by the observed covariates alone, but which might be explained by additional unob- 
served factors. In our case, these unobserved factors are bundled into a time-period 
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effect that we loosely describe as the state of the economy in that time period; 
alternatively, we refer to it as the systematic risk. 

The GLMM framework allows models of much greater complexity. We can 
add further random effects to obtain multi-factor mixture models. For example, 
we might know the industry sector of each firm and wish to include a random 
effect for sector nested inside the year effect; in this way we might capture addi- 
tional variability associated with economic effects in different sectors over and 
above the global variability associated with the year effect. Such models can be 
considered in the GLMM framework by allowing the linear predictor in (8.65) to 
take the form 7;;(W%) = w+ Xp i B+ Z, iV for some vector of random effects 
P, = (Viis. Yi p); the vector z+ į is a known design element of the model that 
picks out the random effects that are relevant to the response Y; ;. We would then 
have a total of p x n random effects in the model. We may or may not want to model 
serial dependence in the time series Wj, ..., Wn. 


Inference for GLMMs. Full ML inference for a GLMM is only a viable option 
for the simplest models. Consider the form of the likelihood for the one-factor 
models in (8.63) and (8.64). If we write py, y, Q | Y) for the conditional probability 
mass function of the response Y; ; (or Ý, i) given WY, we have, for data {Y; ; : t = 
1,...,n, i= 1,..., mth, 


n m 


LB.oiand = f- f (TI Toxine, | Yo JSO nd dan, 


t=1i=1 
(8.66) 
where f denotes the assumed joint density of the random effects. If we do not assume 
independent random effects from time period to time period, then we are faced 
with an n-dimensional integral (or an (n x p)-dimensional integral in multi-factor 
models). Assuming iid Gaussian random effects with marginal Gaussian density fy, 
the likelihood (8.66) becomes 


L(B, o; data) = | | ( f J [pran Yni | wd fo Wd) avs), (8.67) 
t=1 i=l 


so that we have a product of one-dimensional integrals and this can be easily evalu- 
ated numerically and maximized over the unknown parameters. Alternatively, faster 
approximate likelihood methods, such as penalized quasi-likelihood (PQL) and 
marginal quasi-likelihood (MQL), can be used (see Notes and Comments). 
Another attractive possibility is to treat inference for these models from a Bayesian 
point of view and to use Markov Chain Monte Carlo (MCMC) methods to make 
inference about parameters. We believe that this holds particular promise for two 
main reasons. First, a Bayesian MCMC approach allows us to work with much 
more complex models than can be handled in the likelihood framework, such as a 
model with serially dependent random effects. Second, the Bayesian approach may 
be ideal for handling the considerable parameter uncertainty that we are currently 
faced with in portfolio credit risk, particularly in models for higher-rated counter- 
parties where default data are scarce. In the Bayesian approach, prior distributions 


8.6. Statistical Inference for Mixture Models 381 


are used to express opinions about parameters before data analysis; these opinions 
are then updated with the help of the data and Bayes’ theorem to arrive at a pos- 
terior distribution for the parameters. This mechanism could be used to combine 
the parameter information coming from non-statistical industry models with the 
evidence in historical data to achieve improved model calibration. 


8.6.4 One-Factor Model with Rating Effect 


In this section we fit a Bernoulli mixture model to annual default count data from 
Standard & Poor’s for the period 1981-2000; these data may be easily reconstructed 
from published default rates in Brand and Bahr (2001, Table 13, pp. 18-21). Stan- 
dard & Poor’s uses the ratings AAA, AA, A, BBB, BB, B, CCC, but because the 
observed one-year default rates for AAA-rated and AA-rated firms are mostly zero, 
we concentrate on the rating categories A to CCC. 

In our model we assume a single yearly random effect representing “state of the 
economy” and treat rating category as an observed covariate for each firm in each 
time period. Our model is a particular instance of the single-factor Bernoulli mixture 
model in (8.63) and a multiperiod extension of the model described in Example 8.14. 
We assume for simplicity that random effects in each year are iid normal, which 
allows us to use the likelihood (8.67). 

Since we are able to pool companies into groups by year and rating category, we 
note that it is possible to reformulate the model as a binomial mixture model. Let 
r = 1,...,5 index the five rating categories in our study and write ms, for the 
number of followed companies in year t with rating r, and M, , for the number of 
these that default. Our model assumption is that, conditional on Y% (and the group 
sizes), the default counts M;,1,..., M;,5 are independent and distributed in such a 
way that M;, | % = Y ~ B(M r, p-(w)). Using the probit link the conditional 
default probability of an r-rated company in year t is given by 


Pr (Pi) = Our + Y). (8.68) 


The model may be fitted under the assumption of iid random effects in each year by 
straightforward maximization of the likelihood in (8.67). The parameter estimates 
and obtained standard errors are given in Table 8.8, together with the estimated 
default probabilities 7” for each rating category and estimated default correlations 
oe 72) implied by the parameter estimates. Writing W for a generic random effect 
variable, the default probability for rating category r is given by 


0O 
aO EGW) = f O(ir + 6z)o(zdz, 1<r<5, 
[0.6] 


where ¢ is the standard normal density. The default correlation for two firms with 
ratings rı and r2 in the same year is calculated easily from the joint default probability 
for these two firms, which is 


ee) 


Ay = Ebr) Pr (P) = J P (fir, + Fz) P (Âr + 6) 2) dz. 


=p0 
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Table 8.8. Maximum likelihood parameter estimates and standard errors (se) for a 
one-factor Bernoulli mixture model fitted to historical Standard & Poor’s one-year default 
data, together with the implied estimates of default probabilities # “) and default correlations 
by |"? . The MLE of the scaling parameter ø is 0.24 with standard error 0.05. Note that we 
have tabulated default correlation in absolute terms and not in percentage terms. 


Parameter A BBB BB B CCC 
Ur —3.43 —2.92 —2.40 —1.69 —0.84 
se (ur) 0.13 0.09 0.07 0.06 0.08 


n) 0.0004 0.0023 0.0097 0.0503 0.2078 


py) 0.00040 0.00077 0.00130 0.00219 0.00304 A 
0.00077 0.00149 0.00255 0.00435 0.00615 BBB 
0.00130 0.00255 0.00440 0.00763 0.01081 BB 
0.00219 0.00435 0.00763 0.01328 0.01906 B 
0.00304 0.00615 0.01081 0.01906 0.02788 CCC 


The default correlation is then 


eaa Rea) — AD Z0) 


Oo LED AORE — FD)’ 


Note that the default correlations are correlations between event indicators for very 
low probability events and are necessarily very small. 

The model in (8.68) assumes that the variance of the systematic factor Y% is the 
same for all firms in all years. When compared with the very general Bernoulli mix- 
ture model corresponding to CreditMetrics/KMV in (8.45), we might be concerned 
that the simple model considered in this section does not allow for enough hetero- 
geneity in the variance of the systematic risk. A simple extension of the model is to 
allow the variance to be different for different rating categories, that is to fit a model 
where p, (W) = ®(u, + o, Y) and W, is a standard normally distributed random 
effect. This increases the number of parameters in the model by four, but is no more 
difficult to fit than the basic model. The maximized value of the log-likelihood in 
the model with heterogeneous scaling is —2557.4, and the value in the model with 
homogeneous scaling is —2557.7; a likelihood ratio test suggests that no signifi- 
cant improvement results from allowing heterogeneous scaling. If rating is the only 
categorical variable, the simple model seems adequate but, if we had more informa- 
tion on the industrial and geographical sectors to which the companies belonged, it 
would be natural to introduce further random effects for these sectors and to allow 
more heterogeneity in the model in this way. 

The implied default probability and default correlation estimates in Table 8.8 can 
be a useful resource for calibrating simple credit models to homogeneous groups 
defined by rating. For example, to calibrate a Clayton copula to group BB we use 
the inputs z ® = 0.0097 and ae" = 0.004 40 to determine the parameter 0 of the 
Clayton copula (see Example 8.22). Note also that we can now immediately use 
the scaling results of Section 8.4.3 to calculate approximate risk measures for large 
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portfolios of companies that have been rated with the Standard & Poor’s system (see 
Example 8.17). 


Notes and Comments 


The estimator (8.61) for joint default probabilities is also used in Lucas (1995) and 
Nagpal and Bahar (2001), although de Servigny and Renault (2002) suggest there 
may be problems with this estimator for groups with low default rates. A related 
moment-style estimator has been suggested by Gordy (2000) but it appears to have 
a similar performance to (8.61) (see Frey and McNeil 2003). A further paper on 
default correlation estimation is Gordy and Heitfield (2002). 

A good overview article on generalized linear mixed models is Clayton (1996). 
For generalized linear models a standard reference is McCullagh and Nelder (1989) 
(see also Fahrmeir and Tutz 1994). 

The analysis of Section 8.6.4 is very similar to the analysis in Frey and McNeil 
(2003) (where heterogeneous variances for each rating category were assumed). 
While the results reported in this book were obtained by full maximization of the 
likelihood with our own code, we could also have used a number of existing software 
packages for GLMMs. For example, we have verified that the g1me function in the 
S-PLUS correlated data library gives very similar results using the default penalized 
quasi-likelihood method; for more information about penalized quasi-likelihood and 
the related marginal quasi-likelihood method, see Breslow and Clayton (1993). Fora 
Bayesian approach to fitting the model using Markov chain Monte Carlo techniques, 
see McNeil and Wendin (2003); this approach allows fairly complicated models to 
be fitted, including models where the random effects have an autoregressive time 
series structure. 

Although we have only described default models it is also possible to analyse 
rating migrations in the generalized linear model framework (with or without random 
effects). A standard model is the ordered probit model, which is used without random 
effects in Nickell, Perraudin and Varotto (2000) to provide evidence of time variation 
in default rates attributable to macroeconomic factors; a similar message is found 
in Bangia et al. (2002). Wendin and McNeil (2004) show how random effects may 
be included in such models and discuss Bayesian inference. 

A further strand of the literature is the modelling of rating-transition data with 
Markov chain methods. Lando and Skodeberg (2002) estimate Markov chains in 
continuous time from Standard & Poor’s data giving the exact dates of ratings 
transitions. They find evidence of non-Markovian behaviour in the data and raise the 
issue of “ratings momentum”, whereby information about the previous rating history 
of a company beyond its current rating is predictive of the risk of downgrading. See 
also Chapter 4 of Lando (2004) for more information on the Markov chain approach 
as well as the application of survival analysis methodology to default data. 

A number of authors have looked at models with latent structure to capture the 
dynamics of systematic risk; an example is Crowder, Davis and Giampieri (2005), 
who use a two-state hidden Markov structure to capture periods of high and low 
default risk (see also Gagliardini and Gourieroux 2005). 
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There is a huge amount of literature on the estimation of models for pricing credit- 
risky securities that uses mainly data on corporate bond prices as input: Chapter 7 
of Duffie and Singleton (2003) is a good starting point. Empirical work on the 
calibration of credit risk models to loan data, on the other hand, is relatively sparse, 
which is probably related to data problems. An example of the latter type of work 
is Altman and Suggitt (2000). 


9 


Dynamic Credit Risk Models and 
Credit Derivatives 


In this chapter we study credit risk models in continuous time and consider the pric- 
ing of credit derivatives in the framework of reduced-form models (see Section 8.1 
for an overview of model types). Reduced-form models are popular in practice, since 
they lead to tractable formulas explaining the price of credit-risky securities in terms 
of economic covariates, which facilitates estimation. Moreover, with reduced-form 
models it is possible to apply the well-developed pricing machinery for default-free 
term structure models to the analysis of defaultable securities. 

We begin with a brief introduction to credit derivatives in Section 9.1. These 
products have become indispensable tools for the management of credit risk, and 
the corresponding markets have seen a massive growth in recent years. We con- 
tinue with two preparatory sections: Section 9.2 contains mathematical tools for 
reduced-form models; Section 9.3 briefly introduces key concepts from mathemat- 
ical finance, thereby providing the methodological basis for our analysis. Partic- 
ular attention will be given to the distinction between real-world and risk-neutral 
default probabilities. Sections 9.4 and 9.5 review standard but indispensable mate- 
rial on the pricing of defaultable securities in reduced-form models; in particular, 
the relationship between pricing problems for defaultable and default-free securi- 
ties is discussed. Sections 9.6-9.8 are devoted to reduced-form models for credit 
portfolios. We begin with models with conditionally independent defaults. In this 
model class, default times are independent given the realization of some observable 
economic background process, making these models a straightforward extension 
of the static Bernoulli mixture models discussed in Chapter 8. More sophisticated 
models, where there is interaction between defaults in the sense that the default of 
one firm influences the conditional survival probability of the remaining firms in the 
portfolio, are discussed thereafter. 

In this book we have so far concentrated on static or discrete-time models and 
risk-management issues; continuous-time models and the pricing of derivatives have 
played only a minor role. In this chapter we make an exception for a number of 
reasons. First, credit risk management and credit derivatives are intimately linked. 
In fact, the quest of financial institutions for better tools to manage and diversify 
the credit risk in their portfolios and the recent developments in the regulation of 
credit risk are the main drivers of the tremendous growth of the credit derivatives 
market that we are currently witnessing. Since the pay-off of most credit derivatives 
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depends on the timing of default, dynamic credit risk models are clearly needed to 
analyse these products. Second, the pricing of portfolio credit derivatives is a key 
area of application for many of the concepts for modelling dependent risks discussed 
in this book; in particular, copulas play a prominent role. Third, dynamic models 
of portfolio credit risk have recently generated a lot of interest in academia and in 
industry, and we want to offer our readership an introduction to the field. 

Credit risk modelling is a large area that cannot be covered in a few chapters, and 
consequently we have had to omit a lot of interesting and relevant material. Important 
omissions include advanced firm-value models (other than Merton); continuous- 
time models for rating transitions; an analysis of credit risk in interest-rate swaps; 
and forward-rate models of Heath—Jarrow—Morton type for corporate bonds. The 
main reason for these omissions is our decision to focus on the growing field of 
dynamic portfolio credit risk models. Much of the material in this area is of recent 
vintage and to our knowledge has not been discussed extensively at textbook level 
before. 

There are several full textbook treatments of dynamic credit risk models including 
Bielecki and Rutkowski (2002), Bluhm, Overbeck and Wagner (2002), Duffie and 
Singleton (2003), Lando (2004) and Schonbucher (2003). Excellent survey articles 
include Schmidt and Stute (2004) and Giesecke (2004). While each text has a dif- 
ferent focus, some overlap with the material treated here is unavoidable and will be 
indicated, together with suggestions for further reading, in Notes and Comments of 
the respective sections. 


9.1 Credit Derivatives 
9.1.1 Overview 


We find it convenient to divide the universe of credit-risky securities into three 
different types: vulnerable claims, single-name credit derivatives, and portfolio- 
related credit derivatives. Vulnerable claims are securities whose promised pay-off 
is not linked to credit events, but whose issuer may default. Hence the actual pay-off 
received by the buyer of the security (for instance, a counterparty in a swap transac- 
tion) is adversely affected by the default of the issuer. Important examples include 
corporate bonds and interest-rate swaps. While the pricing of certain vulnerable 
claims raises challenging issues, these products are of no concern to us here, as 
credit risk is not their primary focus; some references can be found in Notes and 
Comments. 

Credit derivatives are securities which are primarily used for the management and 
trading of credit risk. In the case of a single-name credit derivative the promised 
pay-off depends on the occurrence of a credit event affecting a single financial 
entity; otherwise the pay-off is related to credit events in a whole portfolio. Credit 
derivatives are a fairly young asset class and the market continues to evolve, with 
new products appearing frequently. Credit derivatives are traded over the counter, 
so the precise pay-off specification may vary a lot between contracts of similar 
type. Nonetheless, due to efforts of bodies such as the International Swap Dealers 
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Association (ISDA), in recent years some standardization has taken place. Credit 
derivatives have become popular because they help financial firms to manage the 
credit risk on their books by dispersing parts of it through the wider financial sec- 
tor, thereby reducing concentration of risk. In fact, the widespread use of these 
instruments may have enhanced the resilience of the overall financial system. In this 
context the following remarks made by Alan Greenspan in his speech before the 
Council on Foreign Relations in November 2002 (see Section 1.4.1) are of interest. 


More recently, instruments ... such as credit default swaps, collateral- 
ized debt obligations and credit-linked notes have been developed and 
their use has grown rapidly in recent years. The result? Improved credit 
risk management together with more and better risk-management tools 
appear to have significantly reduced loan concentrations in telecom- 
munications and, indeed, other areas and the associated stress on banks 
and other financial institutions. 


More generally, such instruments appear to have effectively spread 
losses from defaults by Enron, Global Crossing, Railtrack, WorldCom, 
Swissair, and sovereign Argentinian credits over the past year to a wider 
set of banks than might previously have been the case in the past, and 
from banks, which have largely short-term leverage, to insurance firms, 
pension funds or others with diffuse long-term liabilities or no liabili- 
ties at all. Many sellers of credit risk protection, as one might presume, 
have experienced large losses, but because of significant capital, they 
were able to avoid the widespread defaults of earlier periods of stress. 
It is noteworthy that payouts in the still relatively small but rapidly 
growing market in credit derivatives have been proceeding smoothly 
for the most part. Obviously this market is still too new to have been 
tested in a widespread down-cycle for credit, but, to date, it appears to 
have functioned well. 


Major participants in the market for credit derivatives are banks, insurance compa- 
nies and investment funds. Banks are typically net buyers of protection against credit 
events; insurance companies and other investors are net sellers of credit protection. 


9.1.2 Single-Name Credit Derivatives 


Credit default swaps. Credit default swaps (CDSs) are the workhorse of the credit 
derivatives market; according to a study by Patel (2002), in 2002 the market share 
of CDSs in the credit derivatives market was approximately 67%. Hence the market 
for CDSs written on larger corporations is fairly liquid. Moreover, in contrast to 
corporate bonds, the profitability of CDSs is barely affected by tax issues. For these 
reasons CDSs are the natural underlying security for many more complex credit 
derivatives, and models for pricing portfolio-related credit derivatives are usually 
calibrated to quoted CDS spreads (see Section 9.3.3 below for details). 

The basic structure of a CDS is depicted in Figure 9.1. There are three parties 
involved in a CDS: the reference entity, the protection buyer and the protection 
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C 
Premium payments until default or maturity 
B yes: default payment A 


Default of C occurs? 
iit a E 


no: no payment 


Figure 9.1. The basic structure of a CDS. Firm C is the reference entity, 
firm A is the protection buyer, and firm B is the protection seller. 


seller. If the reference entity experiences a default event before the maturity date T 
of the contract, the protection seller makes a default payment to the protection buyer, 
which mimics the loss on a security issued by the reference entity (often a corporate 
bond) due to the default; this part of a CDS is called the default payment leg. In this 
way the protection buyer has acquired financial protection against the loss on the 
reference asset he would incur in case of a default; note, however, that the protection 
buyer is not obliged to hold the reference asset. As a compensation the protection 
buyer makes a periodic premium payment (typically quarterly or semiannually) to 
the protection seller (the premium payment leg); after the default of the reference 
entity, premium payments stop. There is no initial payment. The premium payments 
are quoted in the form of an annualized percentage x* of the notional value of 
the reference asset; x* is termed the (fair or market quoted) CDS spread. For a 
mathematical description of the payments, see Section 9.3.3 below. 

There are a number of technical and legal issues in the specification of a CDS. 
In particular, the parties have to agree on the precise definition of a default event 
and on a procedure to determine the size of the default payment in case a default 
event of the reference entity occurs. Note that a CDS is traded over the counter and 
is not guaranteed by some clearing house. Hence it is possible that the protection 
seller itself defaults before the maturity of the contract, in which case the default 
protection acquired by the protection buyer becomes worthless. 


Credit-linked notes. A credit-linked note is a combination of a credit derivative and 
a coupon bond that is sold as a fixed package. The coupon payments (and sometimes 
also the repayment of the principal) are reduced if a third party (the reference entity) 
experiences a default event during the lifetime of the contract, so the buyer of a 
credit-linked note is providing credit protection for the seller. Credit-linked notes 
are issued essentially for two reasons. First, from a legal point of view, a credit-linked 
note is treated as a fixed-income investment, so that investors who are unable to enter 
into a transaction involving credit derivatives directly may nonetheless sell credit 
protection by buying credit-linked notes. Second, an investor buying a credit-linked 
note pays the price up front, so that the credit protection sale is fully collaterized, 
i.e. the protection buyer (the issuer of the credit-linked note) is protected against 
losses caused by the default of the protection seller. 
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9.1.3 Portfolio Credit Derivatives 


Notation. In order to describe the pay-off of portfolio credit derivatives we intro- 
duce some notation. We consider a portfolio with m firms. The random vector 
Y, = (%1,---, Yim) describes the state of our portfolio at time ¢ > 0. In keep- 
ing with the notation introduced in Chapter 8, Y;; = 1 if firm i has defaulted 
up to time f, and Y,; = O otherwise; (Y, ;) is termed the default indicator pro- 
cess of firm i. The default time of firm i is denoted by t; > 0. Assuming that 
there are no simultaneous defaults in our portfolio, we may define the ordered 
default times Tg < Ti < --- < Tm recursively by To = 0 and, for 1 < n < m, 
T, = min{t; : ti > Ta-1, | <i < m}. By &n € {1, ..., m} we denote the identity 
of the firm defaulting at time T}, i.e. &, = i if t; = T,,. As in Chapter 8, the exposure 
to reference entity i is denoted by e;; the percentage loss given default of firm i is 
denoted by ô; € [0, 1]. The cumulative loss of the portfolio up to time ¢ is thus given 
by L; = me, bie: Yt 3. 


Basket default swaps. Basket default swaps, or, more technically, kth-to-default 
swaps, offer protection against the kth default in a portfolio with m > k obligors 
(the basket). As in the case of an ordinary CDS the premium payments on a kth- 
to-default swap take the form of a periodic payment stream, which stops at the kth 
default time Tz. The default payment is triggered if 7; is smaller than the maturity 
date of the swap; the size of the default payment may depend on the identity &; of the 
kth defaulting firm. While first-to-default swaps are traded frequently, higher-order 
default swaps are encountered only occasionally in real markets. We discuss the 
pricing of first-to-default swaps in Sections 9.6.3 and 9.8.1 below. 


Collaterized debt obligations (CDOs). CDOs are, at the time of writing, the most 
important class of portfolio credit derivatives. A CDO is a financial instrument for 
the securitization of credit-risky securities related to a pool of reference entities such 
as bonds, loans or protection-seller positions in single-name CDSs; these securities 
form the asset side of the CDO. While many different types of CDO exist, the basic 
structure is the same. The assets are sold to a so-called special-purpose vehicle 
(SPV), a company that has been set up with the single purpose of carrying out 
the securitization deal. To finance the acquisition of the assets, the SPV issues 
notes belonging to tranches of different seniorities, which form the liability side 
of the structure. This amounts to a repackaging of the assets. The tranches of the 
liability side are called (in order of increasing seniority) equity, mezzanine and 
senior (sometimes also super-senior) tranches. In this way most of the losses on the 
asset side generated by credit events are borne by the equity tranche, so the notes 
issued by the SPV belonging to the more senior tranches have a credit rating which 
is substantially higher than the average credit quality of the asset pool. This makes 
the notes attractive to certain investor classes. 

If the asset side consists mainly of bonds and loans, one speaks of asset-based 
structures; if the underlying asset pool consists mainly of protection-seller positions 
in single-name CDSs, the structure is termed synthetic CDO. The cash-flows of a 
synthetic CDO are slightly different from those of an asset-based CDO: on the asset 
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Figure 9.2. Schematic representation of the payments in a CDO structure. 
Payments corresponding to synthetic CDOs are indicated in italics. 


side the SPV receives the premium payments on the CDSs in the asset pool and 
makes the corresponding default payments; on the liability side the SPV receives 
default payments from the noteholders, which are triggered by credit events in the 
asset pool, and makes periodic premium payments as a compensation. The payments 
associated with a typical CDO are depicted schematically in Figure 9.2. 

Asset-based structures where the asset pool consists mainly of bonds are known 
as collaterized bond obligation (CBO); if the asset side consists mainly of loans, a 
CDO is termed collaterized loan obligation (CLO). If the underlying asset pool is 
actively traded with the goal of enhancing its value, a CDO structure is known as 
an arbitrage CDO, if the asset side remains relatively constant during the lifetime 
of the structure, one speaks of a balance-sheet CDO. Necessarily the asset pool of 
an arbitrage CDO consists of tradable securities such as bonds or CDSs. 

There are a number of economic motivations for arranging a CDO transaction. 


e The proceeds from the sale of the notes issued by the SPV are often higher 
than the initial value of the asset side of the structure, as the risk—return profile 
of the notes is more favourable for investors. Similarly, in a synthetic CDO 
the present value of the premium payments received by the SPV may exceed 
the present value of the premium payments the SPV has to make. Arbitrage 
CDOs are set up with the explicit purpose of exploiting this difference. 


e Balance-sheet CDOs are often set up by banks who want to sell some of the 
credit-risky securities on their balance sheet in order to reduce their regulatory 
capital requirements; this is the typical motivation for arranging a balance- 
sheet CLO transaction. In this way a bank can free up regulatory capital. 


Stylized CDOs. Existing CDO contracts can be quite complicated. We therefore 
discuss only stylized CDOs, as this allows us to gain a better understanding of 
the main qualitative features of these products without getting bogged down in 
institutional details. We consider a portfolio of m different firms with cumulative loss 
L; = 7, 6;e; Y; i. We consider a CDO with k tranches, indexed by « € {1,..., k}, 
and characterized by attachment points 0 = Ko < Kı <---< Kk S pany e;. The 
value of the notional corresponding to tranche « can be described as follows. Initially, 
the notional is equal to K, — K,—1; it is reduced whenever there is a default event 
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such that the cumulative loss falls in the layer [K,—1, K,]. In mathematical terms, 
Nx (t), the notional of tranche « at time f, is given by 


K, — Ky-1, forl < K,x_}, 
Ne (t) = fe(Lr) with fe(D = $ Kẹ — 1, forl € [Kx-1, Kx], (9.1) 
0, forl > Kx. 


Note that f, can be written more succinctly as f, (J) = (K, — DY — (Ke-1 — DT, 
i.e. the notional is equal to the sum of a long position in a put option on L; with 
strike price K, and a short position in a put with strike price K,—1. Such positions 
are sometimes called a put spread. 

In a stylized CDO with maturity T as considered here, the pay-off of tranche « is 
equal to N,.(T). In Figure 9.3 we have graphed the pay-off for a stylized CDO with 
three tranches (equity, mezzanine, senior) on a homogeneous portfolio of m = 1000 
firms, each with exposure one unit. The attachment points are Kı = 20, K2 = 40, 
K3 = 60, corresponding to 2%, 4% and 6% of the overall exposure; tranches with 
higher attachment points are ignored. Assuming that T equals one year and that 
we have a homogeneous portfolio with 6; = 0.5 for all firms, we have plotted two 
distributions for L1: first, a loss distribution corresponding to a one-year default 
probability of 0.5% and a default correlation of 2%; second, a loss distribution with 
a one-year default probability of 0.5% but with independent defaults. In both cases 
the expected loss is given by E(L1) = 25. Figure 9.3 illustrates how the value of 
different CDO-tranches depends on the nature of the dependence between default 
events. 


e For independent defaults, Lı is typically close to its mean due to diversifi- 
cation effects within the portfolio. Hence it is quite unlikely that a tranche 
k with lower attachment point K,—; substantially larger than E(L1) (such 
as the senior tranche in Figure 9.3) suffers a loss, so the value of such a 
tranche is quite high. On the other hand, since the attachment point K, of the 
equity tranche is typically lower than E(L}), it is quite unlikely that Lı is 
substantially smaller than K,, and the value of the equity tranche is low. 


If defaults are (strongly) dependent, diversification effects in the portfolio 
are less pronounced. Realizations with Lı bigger than the lower attachment 
point K2 of the senior tranche are more likely, as are realizations with Lı 
smaller than the upper attachment point Kı of the equity tranche. This reduces 
the value of tranches with high seniority and increases the value of the equity 
tranche compared with the case with (almost) independent defaults. 


The impact of changing default correlations on mezzanine tranches is unclear and 
cannot be predicted up front. The relationship between default dependence and the 
value of CDO tranches carries over to the more complex structures that are actually 
traded, so dependence modelling is a key issue in any model for pricing CDO 
tranches. 
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Figure 9.3. Pay-off of a stylized CDO contract and distribution of the one-year loss L1 
for a default probability of 0.5% and different default correlations. Detailed explanations are 
given in the text. 


Notes and Comments 


In this brief introduction we have discussed a few essential features of credit deriva- 
tives, but have omitted the rather involved regulatory, legal and accounting issues 
related to these instruments. Reader interested in these topics are referred to the 
book by Tavakoli (2001) or the recent paper collections edited by Gregory (2003) 
and Perraudin (2004); the last two references also discuss pricing issues. An excel- 
lent treatment of credit derivatives at textbook level is Schönbucher (2003). The 
pricing of interest swaps in the presence of default risk is discussed, for example, 
in Chapter 7 of Lando (2004); a good starting point for tackling the rich literature 
on pricing convertible bonds with credit risk is Chapter 9 of Duffie and Singleton 
(2003). 

The credit derivatives market is evolving rapidly and new publications on these 
instruments appear on a regular basis. The excellent website www.defaultrisk.com, 
maintained by Greg Gupton, is a good place to look for new developments. 


9.2 Mathematical Tools 


In this section we present some mathematical tools for the analysis of reduced- 
form credit risk models. In particular, we discuss random times (in applications, 
usually the default time of a firm), hazard rates and martingale intensities. We start 
with random times with deterministic hazard rates or, alternatively, with a situation 
where the only observable quantity is the default time itself. This forms the basis 
for an analysis of a more realistic situation where additional information, generated 
for instance by economic background processes, is available, so the hazard rate will 
typically be stochastic. We give a detailed treatment of doubly stochastic random 
times. Doubly stochastic random times are the simplest example of random times 
with stochastic hazard rates and are thus frequently used in dynamic credit risk 
models. 
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In our analysis we inevitably have to use basic notions from the theory of stochastic 
processes, such as filtrations, stopping times or basic martingale theory. These issues 
are covered in many standard textbooks on mathematical finance and probability 
theory. For our purposes the technical level of Williams (1991) is sufficient. 

In this chapter we use the following notational convention. A generic stochastic 
process in continuous time is denoted by (X;); the rv X; gives the value of the 
process at time t > 0. Deterministic functions of time are denoted f (t) fort > 0. For 
typographical reasons the notation X (t) is occasionally used for random quantities 
as well. While this notation differs slightly from the conventions introduced in 
Chapter 2, no confusion can arise, as we are dealing exclusively with continuous- 
time processes. 


9.2.1 Random Times and Hazard Rates 


We consider a probability space (2, F, P) and a random time t defined on this 
space, i.e. an F -measurable rv taking values in [0, oo], to be interpreted as the 
default time of some company. By F(t) = P(t < t) we denote the df of t and 
by F (t) := 1 — F(t) the tail or survival function of t; we assume that P(t = 
0) = F(O) = 0, and that F(t) > 0 for all £ < œo. We define the jump or default 
indicator process (Y;) associated with t by Y; = I;<,; for t > 0. Note that (Y;) is 
a right-continuous process which jumps from 0 to 1 at the default time t and that 
1— Y, = lirst}. 

A filtration (F;) on (2, F) is an increasing family {F; : t > 0} of sub-o -algebras 
of F : F; C Fe C F forO < t < s < oœ.For a generic filtration (F,) we set 
Foo =0(U 120 F;). Filtrations are used to model the flow of information in a random 
system. F; represents the state of knowledge of an observer at time t, and A € F; 
is taken to mean that at time ¢ the observer is able to determine if the event A has 
occurred. In this section we assume that the only observable quantity is the random 
time T or, equivalently, the associated jump indicator process (Y;). The appropriate 
filtration is therefore given by (#,) with 

Hı = ({Yu : u < t}), (9.2) 

the history of the default information up to and including time t. By definition, t is 
an (#€;)-stopping time, as {t < t} = {Y; = 1} € H, for all t > 0; moreover, (#;) 
is obviously the smallest filtration with this property. 
Definition 9.1 (cumulative hazard function and hazard rate). The function 
(t):=—- In(F (t)) is called the cumulative hazard function of the random time T. If 
F is absolutely continuous with density f, the function y(t) := f(t)/( — F (t)) = 
f@/ F(t) is called the hazard rate of t. 

By definition we have F(t) = 1 —e-! and P(t) = f(t)/F(t) = y(t), so 
r(t) = fo y(s) ds. The hazard rate y(t) can be interpreted as the instantaneous 
chance of default at ż, given survival up to time t. In fact, for h > 0 we have 
P(t <t+h|t>th=(F(t+h)—- F(t))/F(t). Hence we obtain 
1 lim F(t+h)— F(t) T: 
F(t) h—>0 h 


1 
lim -P(t St+h|t>t)= 
hooh 
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Example 9.2. Consider two popular distributions for survival times: the exponen- 
tial distribution and the Weibull distribution. Recall that the df of the exponential 
distribution with parameter à equals F(t) = 1 — e7% so that y(t) = à for all 
t > 0. The df of the Weibull distribution is given by F(t) = 1 — exp(—Ar®) for 
parameters à, œ > 0. This yields f(t) = Aat®—! exp(—At®) and y(t) = Aat?—!, 
which is decreasing in t if œ < 1 and increasing if a > 1. For œ = 1 we have the 
special case of the exponential distribution. 


Next we discuss conditional expectations with respect to the o-algebra #;. We 
need the following auxiliary result on the structure of #f;-measurable rvs. 


Lemma 9.3. Every #;-measurable rv H is of the form H = h(t) lirt} + cltrsy 
for a measurable function h : [0, t] —> R and some constant c € R. 


Proof. Intuitively the result is obvious, since #f;-measurable rvs can be expressed 
as functions of events related to the default history at t. More formally, we argue as 
follows. The o-algebra H, is generated by the events {Y, = 1} = {t <u}, u < t, 
and {Y; = 0} = {t > t}, and hence by the rvs min{t, t} =: (t At) and Ir>t}. This 
implies that any #f;,-measurable rv H can be written as H = g(t ^ t, Itr>r}) for 
some measurable function g : [0, t] x {0, 1} —> R. The claim follows if we define 
h(u) := g(u, 0), u < t, and c := g(t, 1). 


Lemma 9.4. Lett be a random time with jump indicator process Y; = Iyr<1 and 
natural filtration (#,). Then, for any integrable rv X and any t > 0, we have 

E E CA EN A, (9.3) 

{t>t} t) = 4£{r>1} P(t Ş t) . . 

Proof. Since E(lir>X | Hı) is H-measurable and zero on {t < t}, we 
obtain from Lemma 9.3 that E(Ir>}X | Hi) = l>e for some constant c. 
Taking expectations yields E(X;t > t) = cP(t > t) and hence c = 
E(X; Tt >t)/P( >t). 


As an example we compute conditional survival probabilities. Taking X := Iir>s} 
for s > t in (9.3), we get 


F(s 
P >s | Hi) = E(X | H) = E (>X | H) = iene (9.4) 


F(t) 

The next proposition contains the first result on the stochastic-process properties 
of the jump indicator process of a random time t. Let (F;) be a generic filtration. An 
(F;)-adapted and integrable process (M+) is called an (F;)-martingale if E (M; | 
Fi) = M, for all O < t < s, i.e. if the current value M, is the best prediction (in the 
mean square sense) of the future value Ms. 


Proposition 9.5. Let t be a random time with absolutely continuous df F(t) 
and hazard-rate function y (t). Then M, := Y, — P y(s)ds,t > 0, is an (H;)- 


martingale. 
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Here and below t A ¢ is short for min{t, t}. In Section 9.2.3 we extend this 
result to doubly stochastic random times and discuss its financial and mathematical 
relevance. 


Proof. Let s > t. We have to show that E(M; — M, | #;) = O, i.e. that 
E(Y; — Y; | H) = E(f? y (u)Ity<r} du | F;). Using (9.4), we get 


F(s) 
E(Y; — Y; | Hi) = Tits P(T <s|H)= lr>t} q= 


F(t) 
a F(t) — F(s) 
= A{t>t} F(t) : 
Note that X := SJEY WIwer} du is zero on {t < t}, so X = Xlir». Hence 
we obtain from Lemma 9.4, Fubini’s Theorem and the identity F’(t) = — f(t) = 


—y(t)F(t) that 


EX) _, SP vF (u) du ’ F(t) — F(s) 


E(X | Hi) = iet rene = ale F(t) = F(t) 


= L£{r>r} 


and the result follows. 


9.2.2 Modelling Additional Information 


We now consider a situation where additional information affecting the distribution 
of t is available. In the context of credit risk models this information is typically 
generated by background processes, often modelled as diffusions or continuous- 
time Markov chains, representing, for instance, economic activity in a country or in 
an industry sector, risk-free interest rates or rating transitions between non-default 
states. Formally, we represent this additional information by some filtration (F;) 
on (2, F, P). 


Definition 9.6 (cumulative hazard and hazard-rate processes). Let t be a ran- 
dom time on the filtered probability space (2, F, (F;), P) with P(t > 0) = 1. 
Let F; = P(t < t | F;) and F,=1-—F;. If F; < 1 for all t > O, the (F;)- 
conditional cumulative hazard process (T) is defined by Ij := — In(F,). If (T) 
is strictly increasing and absolutely continuous, i.e. I; = h ys ds for some a.s. 
strictly positive, (F;)-adapted process (y;), then we call (7) the (F;)-conditional 
hazard-rate process of T. 


Recall the definition of the filtration (#,) in (9.2) and introduce a new filtration 


(Gr) by 
Gt = Fi V Hi, t20, (9.5) 


meaning that %; is the smallest o -algebra that contains F; and #;. Obviously t is 
an (#f;) stopping time and hence also a (G;)-stopping time. In the context of credit 
risk models the filtration (¢;) contains information about the background processes 
and the occurrence or non-occurrence of default up to time ¢, and thus typically 
corresponds to the information available to investors. 
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Remark 9.7. The notion of an (¥;)-conditional hazard-rate process is most useful 
for the doubly stochastic random times discussed in Section 9.2.3 below. Note that if 
we assume that F; < 1 for all t > 0 so that (/;) is well defined, t cannot be an (F;)- 
stopping time. Otherwise we would have F; = P(t S t | Fr) = Iten € {0, 1}, as 
{t < t} © F; by the definition of a stopping time. An important example of a random 
time, which does not admit a conditional cumulative hazard process, is provided by 
the first exit time of Brownian motion from some layer. More precisely, let (W;) be 
standard Brownian motion and let (¥;) be the filtration generated by (W;). Consider 
some threshold a < 0 and define Ta = inf{t > 0: W, < a}. Itis well known that Ta 
is an (F;)-stopping time, so the (¥;)-conditional cumulative hazard process is not 
well defined. A similar argument shows that the default time in a first-passage-time 
model (see Section 8.2) does not admit a cumulative hazard process (with respect to 
the filtration generated by the firm-value process); hence the results derived below 
do not apply to those models. 


Conditional expectations. Next we extend the results of Section 9.2.1 and dis- 
cuss the structure of conditional expectations with respect to the full-information 
o-algebra 9r. We need the following auxiliary result on the relationship between 
the o-algebras F, and G,. 


Lemma 9.8. For every %;-measurable rv X there is some ¥;-measurable rv X such 
that X Ips = XIp34. 


In economic terms this result tells us that before default all information is gen- 
erated by the background filtration (¥;); we omit a formal proof. Now we turn to 
conditional expectations with respect to Gr. 


Lemma 9.9. For every integrable rv X we have 


EUs X | Fi) 


Elen X | $1) = leon SO By 
t 


Note that Lemma 9.9 allows us to replace certain conditional expectations with 
respect to %; by conditional expectations with respect to the background informa- 
tion F;. 


Proof. E(Iz>1X | Gr) is $1-measurable and zero on {t < t}. By Lemma 9.8 there 
is therefore an ¥;-measurable rv Z such that EUs X | $i) = ene. Taking 
conditional expectations with respect to F; yields, as F; C Qr, 


EUe>yX | F) = P(t > t | FZ. 


Hence Z = E(lir>t X | F:)/P(t > t | Fi), which proves the lemma. 


Corollary 9.10. Lets > t. If X is integrable and Fs -measurable, we have 


E(r>s}X | $1) = [tes EX | F). 
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Proof. Let X := lige X: Since X = r> X (as s > t), Lemma 9.9 yields 
E(Irss)X | Gr) = E(QrsyX Gr = Irsne" Eres} X En? 
where we have used the fact that P(t > t | F;) = eT", Since X is F,;-measurable, 


E(texsX | Fi) = E(ŠP (t > s | F:) | F) = E Xe ™ | F), 


and the result follows. 


Corollary 9.10 will be useful in the pricing of corporate bonds. Suppose that the 
default time t admits the conditional hazard-rate process (7), that the default-free 
interest rate (r;) is adapted to the background filtration (¥;), and that P represents 
the probability measure used for pricing (to be explained in Section 9.3). Consider 
a corporate zero-coupon bond with zero recovery and maturity T > t, at maturity 
its value is given by /;;.7}. Define X:= exp(— Jr rs ds); the price at time ¢ of our 
bond is hence given by E (lr>r}X | Yr). We get, from Corollary 9.10, 


T 
E(r>1yX | 41) = enef exw(- f (r+ 148) | s). 
t 


Expressions of this type are often easily computed using techniques from standard 
default-free term structure models (for details we refer to Sections 9.4.3 and 9.5 
below). 

Corollary 9.10 moreover implies that in the above setting y; gives a good approx- 
imation of the one-year default probability in the following sense. We have 


t+1 
P(t >t+1|9)= Iron) (exp ( -f Vs as) | z) (9.6) 
t 


Suppose now that the hazard rate remains relatively stable over time so that P (ys ~ 
yı for all s € [t, t + 1]) is close to one and that t > t. Under these assumptions, the 
right-hand side of (9.6) is approximated reasonably well by exp(—y;). If y, is not 
too large, we thus get on {t > t} for the one-year default probability 


P(t <t+1| Gr) ~ 1—exp(—y) © yr. (9.7) 
9.2.3 Doubly Stochastic Random Times 


Doubly stochastic random times—also called conditional Poisson or Cox random 
times in the literature—are the main example of random times with a stochastic 
hazard rate. For our analysis of these random times we use the framework introduced 
in the previous section. In particular, (;) denotes the background filtration, (J€;) is 
the filtration generated by the jump indicator process associated with the random 
time qt, and the filtration (4r) is defined by Qi = F; V H. 


Definition 9.11 (doubly stochastic random time). A random time t is called 
doubly stochastic with respect to the background filtration (F;) if t admits the (F;)- 
conditional hazard-rate process (y+), if T; = i ys ds is strictly increasing, and if, 
for all t > 0, 

P(t St | Foo) = P(t <t| Fi). (9.8) 
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Condition (9.8) is most easily interpreted if we assume that the background filtra- 
tion is generated by some stochastic state variable process (W,), i.e. if F; = o ({W, : 
u < t}). In that case (9.8) states that, given past values (W,,),,<; of the state vari- 
able, the future (W,),., does not contain any extra information for predicting the 
probability that t occurs before time t. Obviously, (9.8) excludes models where the 
probability that t < t depends on the future evolution (W,),.; of the state variable. 


Construction and simulation via thresholds. In the next lemma we give an explicit 
construction of a doubly stochastic random time. This construction is very useful 
for simulation purposes. 


Lemma 9.12. Let E be a standard exponentially distributed rv on (2, F , P) inde- 
pendent of Fog, i.e. P(E < t | Foo) = 1 — e™ for allt > 0. Let (y) be a positive 
(F;)-adapted process such that Ty = h ys ds is strictly increasing and finite for 
every t > 0. Define the random time t by 


t := TLE) = inf{t 2 0: T, > E}. (9.9) 
Then t is doubly stochastic with (¥;)-conditional hazard-rate process (y;). 
Proof. We have, by definition, 
P(t <t | Foo) = P(T; > E | Foo) = 1 — exp(-T;), 


since I} is F4.-measurable and E is independent of Foo. Moreover, since 1 — 
exp(—J;) is ¥;-measurable, we get, using iterated conditional expectations, 


P(t St | Fi) = E(P(t < t | Foo) | Fr) = 1 — exp(-T;), 


which proves the claim. 


Lemma 9.12 has a converse, which is presented next. 


Lemma 9.13. Let t be a doubly stochastic random time with (¥;)-conditional 
hazard-rate process (y;). Denote by Ty = J ys ds the conditional cumulative hazard 
process of t and put E := I. Then the rv E is standard exponentially distributed 
and independent of Fæ, and t = T! (E) almost surely. 


Proof. Since (I}) is strictly increasing by assumption, the relation t = IPT} (E) is 
clear from the definition of E. To prove that E has the correct distribution we argue 
as follows: 


P(E < t | Pa = P(T; < t | Foo) =P Cal Ol Foo). 


Since t is doubly stochastic, the last expression equals 1 — exp(— I (I TIO) = 
1 — e™*, which shows that E is independent of Fa and that it is standard exponen- 
tially distributed. 


Lemma 9.12 forms the basis for the following algorithm for the simulation of 
doubly stochastic random times. 
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Time 


Figure 9.4. A graphical illustration of Algorithm 9.14: E ~ 0.44, t © 6.59. 


Algorithm 9.14 (univariate threshold simulation). 


(1) Generate a trajectory of the hazard-rate process (7). References for suitable 
simulation approaches are given in Notes and Comments. 


(2) Generate a unit exponential rv E independent of (y+) (the threshold) and set 
t = I7! (E); this step is illustrated in Figure 9.4. 


Moreover, Lemmas 9.12 and 9.13 provide an interesting interpretation of dou- 
bly stochastic random times in terms of operational time: for a given (¥;)-adapted 
hazard-rate process (y;), define a new timescale (operational time) by the associated 
cumulative hazard process I, = J ys ds, so that c units of operational time corre- 
spond to l7! (c) units of real time. Take a standard exponential rv E independent of 
Foo and measure time in units of operational time. Then the associated calendar time 
t := I7} (E) is doubly stochastic by Lemma 9.12. Conversely, by Lemma 9.13, if 
we take a doubly stochastic random time T, the associated operational time E := I; 
is standard exponential, independent of Foo. The notion of operational time plays 
an important role in insurance mathematics (see Section 10.2.7). 


Martingale intensity of doubly stochastic random times. We have seen in Proposi- 
tion 9.5 that the jump indicator process (Y;) can be turned into an (Jf; )-martingale 
if we subtract the process i y(s) ds. Here we generalize this result to doubly 
stochastic random times. 


Proposition 9.15. Lett be a doubly stochastic random time with (¥;)-conditional 
hazard-rate process (y,). Then M, := Y, — i Ys ds is a (G;)-martingale. 


Proof. Define a new artificial filtration (G1) by Gi = Foo V H; in particular, ĝo = 
Foo and Gr C G, for all t. Conditioning on ĝo turns t into a random time with 
deterministic hazard rate y (s): we have 


t 
Per <t1 Go) =1—exp( — f nas), 
0 
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and y is known given ĝo. Hence Proposition 9.5 implies that M, := Y; — h Aa Ys ds 
is a martingale with respect to (G,). Since (M+) is (G;)-adapted and 9r C Gr, (Mr) is 
also a martingale with respect to (ġ+). 


We conclude with a brief discussion on Proposition 9.15 from the viewpoint of 
stochastic calculus. 


Definition 9.16. Given the set-up of Section 9.2.2, a non-negative (G,)-adapted 
process (à+) is called a (4;)-martingale intensity process of the random time t if 
M, := Y, — P As ds is a (G,)-martingale. 


In reduced-form credit risk models, (A;) is usually called the default intensity of 
the default time t. It is well known that the martingale intensity (A;) is uniquely 
defined on {t < t}. This is an immediate consequence of general results from 
stochastic calculus concerning the uniqueness of semimartingale decompositions 
(see, for example, Chapter 2 of Protter (1992)). Martingale intensities are important 
tools in the analysis of jump indicators and random times from the viewpoint of 
stochastic calculus. In credit risk, martingale default intensities and credit spreads 
of defaultable bonds are closely related, as will be discussed in Section 9.4 below. 

Using the terminology of Definition 9.16, we may restate Proposition 9.15 in 
the form “the (ġ:)-martingale intensity of a doubly stochastic random time T is 
given by its (F;)-conditional hazard-rate process (y;)”. Outside the realm of doubly 
stochastic random times, the relationship between martingale default intensities and 
hazard-rate processes becomes more subtle. In fact, in the analysis of reduced-form 
credit portfolios one naturally encounters random times which admit a martingale 
intensity process in the sense of Definition 9.16, but whose conditional cumulative 
hazard process T; is not absolutely continuous, for instance because it has jumps. 
In that case Proposition 9.15 obviously no longer holds. 


Notes and Comments 


The material discussed in this section is treated in various sources; our presentation 
is based on the book by Bielecki and Rutkowski (2002), where many extensions of 
our results can also be found. In particular, Bielecki and Rutkowski discuss various 
probabilistic characterizations of doubly stochastic random times. The threshold- 
simulation approach for doubly stochastic random times requires the simulation of 
trajectories of the hazard-rate process. An excellent source for simulation techniques 
for stochastic processes is Glasserman (2003a). 

More general reduced-form models where the default time t is not doubly stochas- 
tic are discussed, for example, in Kusuoka (1999), Elliot, Jeanblanc and Yor (2000), 
Bélanger, Shreve and Wong (2004), Collin-Dufresne, Goldstein and Hugonnier 
(2004) and in Chapter 7 of Bielecki and Rutkowski (2002). 


9.3 Financial and Actuarial Pricing of Credit Risk 


Essentially, two approaches are used for the pricing of credit-risky securities: the 
financial or risk-neutral pricing approach on the one hand, and the actuarial pricing 
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1.0 (no default) 
0.941 


0.6 (default, recovery = 60%) 
Figure 9.5. Evolution of the price of pı (- , 1) in Example 9.17. 


approach on the other. Under the risk-neutral pricing approach, prices are com- 
puted as expected discounted values under some equivalent martingale measure 
(see below). This approach is based on the notions of absence of arbitrage and 
dynamic hedging. Nowadays, the risk-neutral pricing approach is standard for pric- 
ing non-defaultable securities. In credit risk it is used for pricing traded securities 
such as corporate bonds and credit default swaps and derivative securities related to 
these products. 

In the actuarial approach, prices are computed as the sum of the expected pay-off 
under the physical measure and a risk premium. The size of the risk premium is 
often related to the notion of economic capital. In credit risk, the actuarial approach 
is applied mainly to the pricing of non-traded loans or structured products related 
to illiquid securities. 

In this section we discuss and compare both approaches with a view towards 
pricing credit-risky securities. Our discussion provides the methodological basis for 
the derivation of pricing formulas in subsequent sections. 


9.3.1 Physical and Risk-Neutral Probability Measure 


We begin with a discussion of the relationship between the real-world or physical 
measure, which models the actual probability of default, and an equivalent martin- 
gale measure or risk-neutral measure. We use the following simple example as a 
vehicle for our analysis. 


Example 9.17 (the basic static model). We consider a defaultable zero-coupon 
bond with maturity T equal to one year. We assume that the recovery rate 1 — ô is 
deterministic and equal to 60%; that the real-world default probability is equal to 
p = 1%; and that the risk-free simple interest rate equals 5%. Moreover, we assume 
that the current (t = 0) price of the bond equals pı (0, 1) = 0.941; and that the price 
of the corresponding default-free bond is po(0, 1) = (1.05)~! = 0.952. The price 
evolution of the bond is depicted in Figure 9.5. 

The expected discounted value of the bond equals (1.05)~!(0.99-1+0.01-0.6) = 
0.949 > pı(0, 1). We see that the price p; (0, 1) is smaller than the expected dis- 
counted value of the claim. This is the typical situation in real markets for corporate 
bonds, as investors demand a premium for bearing the default risk of the bond. In 
real markets, the price of corporate bonds is also affected by tax issues (interest 
income from corporate bonds is often taxed at a higher rate than interest income 
on treasury bonds) and by liquidity issues; both factors tend to further decrease the 
price of corporate bonds relative to treasury bonds. An equivalent martingale mea- 
sure or risk-neutral measure is an artificial measure Q equivalent to the physical 
probability measure P such that the discounted price process of any security is a 
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Q-martingale. According to a standard result of mathematical finance (the so-called 
first fundamental theorem of asset pricing), a model for security prices is arbitrage 
free if and, modulo technicalities, only if it admits at least one equivalent martin- 
gale measure Q. In our two-state model, Q is simply given by an artificial default 
probability 7 such that p,(0, 1) = (1.05)~'((1 — g)- 1 + 4 - 0, 6); @ is uniquely 
determined from this equation and is given by g = 0.03. Note that in our example g 
is bigger than the physical default probability p = 0.01; again this is typical for real 
markets and reflects the risk premium demanded by buyers of defaultable bonds. 

The risk-neutral default probability q is closely related to the credit spread of 
the defaultable bond (see (8.11)). Since c(0, 1) = —(n p1(0, 1) — In po(O, 1)), we 
obtain, in our two-state model, 


c(0, 1) = —In(d — g)- 14+ 4-0.6)=—-Ind—q-04)~+q-04, 


i.e. the credit spread is approximately equal to the product of default probability and 
(percentage) loss given default. Similar relationships hold in more general reduced- 
form credit risk models (see Section 9.4.2 below). Hence spread data for corporate 
bonds can be used to estimate risk-neutral default probabilities. This observation 
forms the basis for many empirical studies on the relationship between physical and 
risk-neutral default probabilities; we discuss the findings of a recent extensive study 
below. 


From physical to risk-neutral default probabilities. How does the structure of 
credit risk models and hence default probabilities change if we go from the physical 
measure (labelled P) to a risk-neutral measure (labelled Q)? A concise mathemat- 
ical answer to this question requires the use of sophisticated tools from stochastic 
calculus (variants of Girsanov’s theorem for diffusions and point processes) and is 
beyond the scope of this book. We therefore content ourselves with an informal 
discussion of the transition from physical to risk-neutral probabilities in firm-value 
and reduced-form models. 

In firm-value models such as Merton’s model (see Section 8.2.1), when going 
from P to Q the drift of the asset-value process V is changed from some arbitrary u 
to the default-free short rate of interest r. According to (8.4), in Merton’s model the 
physical default probability over the interval [0, T] is given by 


In F — ln Vo — (u — 40°)T 
o/T ) 
the risk-neutral default probability over the same horizon is 
In F —In Vo — (r — 50°)T 
a) 


’ 


p= Pir < =o 


j= Or < F)=0( 
We obtain from these equations that 


j=0(em+4—*vr). (9.10) 


oO 


Note that the correction term (u — r)/o equals the Sharpe ratio of V (a popular 
measure of the risk premium earned by the firm). The transition formula (9.10) is 
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frequently applied in practice to go from physical to risk-neutral default probabilities. 
Note, however, that (9.10) is only justified, strictly speaking, in the narrow context 
of the Merton model. 

In standard reduced-form models the default time is modelled as a doubly stochas- 
tic random time with hazard rate yE = h” (W,) (under the physical measure P) and 
v2 = h2(W,) (under a risk-neutral measure Q). Here (W) is some d-dimensional 
process representing economic factors, which is adapted to the background filtra- 
tion (F;); h?” and h2 are functions from R? to R4. In this context arbitrage theory 
alone gives little guidance on the form of the ratio AÊ / h” , the only restriction being 
that h2 must be chosen so that the model is consistent with observed prices of traded 
credit-risky securities. Recent research has therefore tried to derive further restric- 
tions on the ratio h2/h? by bringing in economic arguments (see, for example, 
Jarrow, Lando and Yu 2005). 

In practice, one usually postulates that h? and hÊ belong to a given parametric 
family of functions from Rf to R+. For instance, one might assume that h2 = vh? 
for some scaling factor v > 0. The function A?” can be determined by fitting the 
model to historical default probabilities; v is found by fitting the model to observed 
prices of corporate bonds or credit default swaps. Alternatively, the model is set up 
directly under Q and one restricts oneself to determining h2 from observed market 
prices of corporate bonds or credit default swaps; this is the martingale-modelling 
approach discussed in Section 9.3.3 below. 


Empirical evidence. As discussed above, the risk-neutral default probability of 
a corporation can be estimated from credit-spread data for bonds issued by that 
corporation. By comparing these estimates with estimates for the physical default 
probability—obtained, for instance, from the KMV model introduced in Sec- 
tion 8.2.3—it is possible to gain some empirical evidence on the relationship of 
physical and risk-neutral default probabilities in real markets. Understanding this 
relationship is important, as it enables market participants to use information on his- 
torical default probabilities in pricing credit-risky securities, and conversely to use 
prices of defaultable bonds or market quotes for credit default swaps as additional 
input in determining historical default probabilities. 

An extensive empirical study of the relationship between physical and historical 
default probabilities is Berndt et al. (2004). In this study market quotes for fair 
CDS spreads (instead of credit spreads of corporate bonds) are used to infer risk- 
neutral default probabilities. In this way, problems related to the differing taxation 
of corporate and treasury bonds can be circumvented. The authors ran regression 
analyses of the observed spreads for five-year CDSs against five-year EDFs for a 
large pool of firms. The five-year EDF of a firm with publicly traded stock is an 
annualized estimate of the physical five-year default probability. The computation of 
EDFs is described in detail in Section 8.2.3, and annualization is a way of expressing 
EDFs for different time horizons on a common yearly scale. A formal treatment of 
CDS pricing is given below in Section 9.3.3 (for models where the default time t 
has a deterministic risk-neutral hazard rate) and in Section 9.4.3 (for the case of a 
doubly stochastic default time). 
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For the interpretation of the regression results, it suffices to know that, in an 
environment where the default-free interest rate (r;) and the risk-neutral hazard rate 
(v2) of some firm i do not fluctuate too much, we have an approximate relationship 
between the risk-neutral hazard rate y£, the fair CDS spread x;* observed at time t 
and the percentage loss given default Š; (assumed deterministic) of the form 


yE ~ x7 /ôi (9.11) 


(see Section 9.3.3 for details). Moreover, it is an immediate consequence of (9.7) 
that in models with a doubly stochastic default time the risk-neutral hazard rate ve 
is approximately equal to the conditional risk-neutral one-year default probability 
qr,i Of firm i at time f. 

Berndt et al. (2004) began by estimating the following simple linear model for 
the relationship between the observed swap spread x* of firm i and the five-year 
EDF, labelled EDF;, on the same day, both measured in basis points (one basis point 
equals 0.01%): 

x} = 52.26 + 1.627 EDF; +¢;. (9.12) 


The model (9.12) was estimated using a large sample of more than 18 000 CDS—-EDF 
observations from September 2000 to August 2003 for firms from three industry sec- 
tors (North American Oil and Gas, North American Healthcare and North American 
Broadcasting and Entertainment). The authors propose the following interpretation 
of this regression result. Under the model (9.12) the fair swap spread x; increases 
by approximately 16 basis points for every 10 basis point increase in the five- 
year EDF; neglecting the intercept, which is small, we have approximately that 
x*/ EDF; ~ 1.6. Assuming that the risk-neutral loss given default, 5, equals, say, 
0.75, and that risk-neutral and physical default intensities are relatively stable over 
time, this would imply a ratio of risk-neutral to physical default intensity of 


y2 xf/8 ee 


x = x -1.6= 2.13. 
y?” EDF; 6SEDF; 0.75 


As explained above, y2/y? ~ ĝ/p, so these numbers also relate to the ratio of 
risk-neutral and physical default probabilities. A loss given default of 6 = 0.75 is 
a very conservative (high) estimate. If we take a lower value for 5, say 6 = 0.5, we 
obtain a ratio of g/p ~ as - 1.6 = 3.2, i.e. the ratio of risk-neutral to actual default 
probability gets even higher. 

A careful inspection of the CDS—EDF relationship shows that the simple linear 
model (9.12) might not be appropriate for a number of reasons. First, the high 
intercept of 52.26 basis points is implausible, as it would imply that even for a firm 
with historical default probability p close to zero the swap spread is still of the 
order of 50 basis points. Second, Berndt et al. (2004) found that the ratio x} / EDF; 
varies between industry sectors—reflecting different recovery rates for different 
industries—and over time, as is illustrated in Figure 9.6. Third, there seems to be 
some concavity in the relationship between swap spreads and EDFs; in particular, 
the ratio x*/ EDF; is higher for high-quality firms with low EDF values than for 
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Figure 9.6. Ratio of one-year risk-neutral and historical default probabilities for 
Vintage Petroleum, as estimated by Berndt et al. (2004). 


low-quality firms. For these reasons the authors go on to consider a more refined 
logarithmic regression model, which fits the data significantly better. 

The analysis of Berndt et al. (2004) is corroborated by other empirical work on 
default risk premiums mentioned in Notes and Comments. This empirical research 
clearly shows that physical and risk-neutral default probabilities can differ substan- 
tially, and care must be taken to distinguish between the two concepts. 


9.3.2 Risk-Neutral Pricing and Market Completeness 


Itis a fundamental insight of modern mathematical finance that in a complete market 
the price of derivative securities can be computed as the mathematical expectation 
of the discounted pay-off under a risk-neutral measure. In this section we explain 
the idea underlying this important result and discuss its applicability to the pricing 
of credit derivatives. We refrain from a general analysis; instead we use variants of 
the simple static model introduced in Example 9.17 as a vehicle for our discussion. 

Consider, in the context of Example 9.17, an investor, for example an invest- 
ment bank, who plans to sell credit derivatives on the zero-coupon bond with 
price p;(-, T). In particular, consider a default put option with maturity date T = 1. 
This contract pays one unit if the bond defaults and zero otherwise; it can be thought 
of as a simplified version of a CDS. Obviously, the pay-off of the default put is 
unknown at date t = 0 and thus constitutes a risk. Therefore, two questions arise 
for our investor: how should the option be priced, and how should the risk incurred 
by selling it be dealt with? The answer given by the modern theory of mathematical 
finance goes back to the seminal papers of Black and Scholes (1973) and Merton 
(1973), who showed that it is often possible to replicate the pay-off of a deriva- 
tive security by (dynamic) trading in the underlying assets. It follows that the risk 
incurred by the seller can be eliminated; moreover, the fair price of the derivative is 
given by the initial price of the replicating portfolio. 

Let us apply this insight to the default put. We form a portfolio in the defaultable 
bond and cash, with value at time t = 1 equal to the pay-off of the put. At time 
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t = 0 we go short 2.5 units of the bond and hold x 7 2.38 units of cash. At time 


t = | there are two possibilities for the value V; of this portfolio. 


e Default occurs: in which case Vj = (—2.5) - 0.6 + > -1.05 = 1. 
e No default: in which case Vj = (—2.5)- 1+ A -1.05 = 0. 


In either case the value V; of the hedge portfolio equals the pay-off of the option. 
Hence the fair price at t = 0 of the option should equal the value of the hedge 
portfolio at t = 0 given by Vo = (—2.5) -0.941 + 4 = 0.0285; otherwise either 
the buyer or the seller could make some riskless profit. To construct the portfolio in 
this simple one-period, two-state setting we have to consider two linear equations. 
Denote by £; and &2 the units of the defaultable bond and the amount of cash in our 
portfolio. At time t = 1 we must have & - 0.6 + & - 1.05 = 1 (the default case), 
and é; - 1 + & - 1.05 = 0 (the no-default case), which leads to the above values of 
& = —2.5 and & = 30. 

In mathematical finance a derivative security is called attainable if there is a 
(dynamic) portfolio strategy in traded underlying assets that replicates the pay-off 
of the derivative. The above argument shows that in our simple one-period two-state 
model every derivative security is attainable. Such models are termed complete. 

Note that the physical default probability p did not enter the pricing argu- 
ment. It is nonetheless possible to compute the fair price of the default put as the 
expected value of the discounted pay-off, if the risk-neutral measure Q is used 
instead of the physical measure P. Recall that the risk-neutral default probabil- 
ity is given by g = 0.03. The expected discounted pay-off under Q is given by 
(1.05)! (0.97 - 0 + 0.03 - 1) = 0.0285 and is thus equal to the fair price Vo. This is, 
of course, not a lucky coincidence. In fact, a basic result from mathematical finance, 
the so-called risk-neutral pricing rule, states that the fair price of any attainable 
claim can be computed as the expected value of the discounted pay-off under a risk- 
neutral measure. Armed with this result, one typically first computes the candidate 
price (the expected value of the discounted pay-off under a risk-neutral measure) 
and second determines the replicating strategy. For this reason a lot of research 
focuses on the problem of computing prices. However, one should bear in mind 
that the main economic justification for computing prices as expected discounted 
value under a risk-neutral measure stems from the hedging argument, and is there- 
fore strictly speaking only justified for attainable claims. This issue has, to a large 
extent, been neglected in the literature on the pricing of credit-risky securities. The 
next example illustrates some of the difficulties arising in incomplete markets, where 
most derivatives are not attainable. 


Example 9.18 (a model with random recovery). As there is a substantial amount 
of randomness in real recovery rates (see Section 8.4.6), it is interesting to study 
the impact of random recovery rates on the validity of the above pricing arguments. 
Consider the following variant of Example 9.17 with random recovery: the loss given 
default may be either 30% or 50%, pi (0, 1) = 0.941, and the riskless interest-rate 
equals 5%. The price evolution of pı(-, 1) is illustrated in Figure 9.7. We leave 
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Figure 9.7. Evolution of the price of pı (- , 1) in Example 9.18. 


the physical measure unspecified—we assume only that all three possible outcomes 
have strictly positive probability. 

We begin our analysis of this model by determining the equivalent martingale 
measures. Put qı := Q(piC1, 1) = 0.5), q2 := Q(pı(1, 1) = 0.7) and q3 := 
Q(pı(l, 1) = 1), so that q3 = 1 — qi — q2. We obtain the following equation for 
qi and q2, 


pi(0, 1) = 1.057" (q1 -0.5 + q2-0.7+ (1 — qi — 2) 1), (9.13) 


and the restrictions g; > 0, q2 > 0, 1 — qı — q2 > 0. Obviously, Q is no longer 
unique. It is easily seen from (9.13) that the set Q of equivalent martingale measures 
is given by 


Q ={q € R? : qı € (0, 0.024), q2 = (1 — 1.05 - pı (0, 1) —0.5-q1), 
q3 = 1 — (qı +q2)}. (9.14) 


It is interesting to look at the boundary cases. For qı = O we obtain q2 = 4%, 
q3 = 96%; this is the scenario where the risk-neutral default probability g = q1 + q2 
is maximized. For qı = 2.4% we obtain q2 = 0, q3 = 97.6%; this is the scenario 
where q is minimized. Note, however, that the measures go := (0.024, 0, 0.976) and 
qi := (0, 0.04, 0.96) do not belong to Q, as they are not equivalent to the physical 
measure P. 

Consider a derivative security with pay-off H and maturity T = 1, such as the 
default put with H = 0 if pı(1, 1) = 1 and H = 1 otherwise. Every price of the 
form Ho = E2(1.05~!H) for some Q € Q is consistent with no arbitrage and will 
therefore be called an admissible value for the derivative. If @ contains more than 
one element, as in our case, there is typically more than one admissible value. For 
instance, we obtain for the default put option that 


. of E o( H 
inf E~ | — | ~ 0.023 and sup E~|{ —— } © 0.038; (9.15) 
QEA 1.05 QQ 1.05 


obviously, the infimum and supremum in (9.15) correspond to the measures qo 
and q1, where q is minimized and maximized, respectively. This non-uniqueness of 
admissible values reflects the fact that in our three-state model the put is no longer 
attainable. In fact, the hedging portfolio (£1, 2) now has to solve the following three 
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equations: 


&-0.5+&-1.05=1 (default, low recovery), 
& -0.7 +2- 1.05 = 1 (default, high recovery), (9.16) 
&-1+&-1.05=0 (no default). 


It is immediately seen that the system (9.16) of three equations and only two 
unknowns has no solution, so that the default put is not attainable. This illustrates 
two fundamental results from modern mathematical finance: a claim with bounded 
pay-off is attainable if and only if the set of admissible values consists of a sin- 
gle number; an arbitrage-free market is complete if and only if there is exactly 
one equivalent martingale measure Q. The second result is known as the second 
fundamental theorem of asset pricing. 


Example 9.18 shows that in an incomplete market conceptual issues arise which 
are not present in models for complete markets. In particular, it is not obvious how 
to choose the correct price of a derivative security from the range of admissible 
values or how to deal with the risk incurred by selling a derivative security. This is 
unfortunate, as realistic models, which capture the dynamics of financial time series, 
are typically incomplete. In recent years a number of interesting concepts for the risk 
management of derivative securities in incomplete markets have been developed. 
These approaches typically propose mitigating the risk by an appropriate trading 
strategy and often suggest a pricing formula for the remaining risk. However, the 
systematic application of these approaches to the pricing and the hedging of credit- 
risky securities is currently still in its infancy, and we refrain from further discussion. 
A brief overview of the existing work on incomplete markets is given in Notes and 
Comments. 


9.3.3. Martingale Modelling 


Recall that, according to the first fundamental theorem of asset pricing, a model 
for security prices is arbitrage free if and (essentially) only if it admits at least one 
equivalent martingale measure Q. Moreover, in a complete market, the only thing 
that matters for the pricing of derivative securities is the Q-dynamics of the traded 
underlying assets. When building a model for pricing derivatives it is therefore 
a natural shortcut to model the objects of interest—such as interest rates, default 
times and the price processes of traded bonds—directly, under some exogenously 
specified martingale measure Q. In the literature this approach is termed martingale 
modelling. 

Martingale modelling is particularly convenient if the value of the underlying 
assets at some maturity date T is exogenously given, as in the case of zero-coupon 
bonds. In that case the price of the underlying asset at time t < T can be computed 
as the conditional expectation under Q of the discounted value at maturity. For- 
mally, denoting by B(t) > 0 the so-called numéraire (often the default-free savings 
account) and by ¢, the information available to investors at time t, we have the 
following formula for the price at time t of a security, whose value at T is given by 
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the #7-measurable rv H > 0: 
H, = B(t)E2(B(T)'H |G), t<T. (9.17) 


Model parameters are then determined using the requirement that at time t = 0 the 
price of traded securities as computed from the model using (9.17) should coin- 
cide with the price of these securities as observed in the market; this is known as 
calibration of the model to market data. 

Martingale modelling ensures that the resulting model is arbitrage free, which is 
advantageous if one has to model the prices of many different securities simultane- 
ously. Therefore the approach is frequently adopted in default-free term structure 
models and in reduced-form models for credit-risky securities. Martingale mod- 
elling has two drawbacks. First, historical information is, to a large extent, useless 
in estimating model parameters, as these may change in the transition from the real- 
world measure to the equivalent martingale measure. For instance, as explained 
above, physical and risk-neutral default probabilities and default intensities may 
differ substantially. Second, as illustrated in Example 9.18, realistic models for 
pricing credit derivatives are typically incomplete, so that one cannot eliminate all 
risk by dynamic hedging. In those situations one is interested in the distribution of 
the remaining risk under the actual risk measure P, so martingale modelling alone 
is not sufficient. In summary, the martingale-modelling approach is most suitable in 
situations where the market for underlying securities is relatively liquid. In that case 
we have sufficient price information to calibrate our models, and issues of market 
completeness become less relevant. 


Martingale modelling with given CDS spreads. We now use the martingale- 
modelling approach to construct a simple reduced-form pricing model with deter- 
ministic hazard rate for credit derivatives on a given reference entity. We assume 
that market information consists of quotes for fair spreads for CDSs of varying 
maturities on this entity. This example illustrates the model-building process if mar- 
tingale modelling is used. Moreover, the example is of practical relevance: since the 
CDS market is among the most liquid markets for credit-risky securities, the task of 
building a model using CDS spreads as input is frequently encountered in practice. 

Using martingale modelling we model the objects of interest directly under 
some martingale measure Q. We assume that under Q the default time t is 
a random time with deterministic risk-neutral hazard rate y@(t). For simplic- 
ity we take interest rates and recovery rates (or equivalently loss given default) 
to be deterministic. The percentage loss given default is denoted by 6 € (0, 1). 
The continuously compounded interest rate at time f is denoted by r(t) > 0, so 
po (0, t) = exp(— te r(s) ds) is the price of the default-free zero-coupon bond with 
maturity t. This is the simplest type of model that can be calibrated to a given term 
structure of default-free interest rates and CDS spreads; generalizations allowing 
for stochastic interest rates, recovery rates and hazard rates will be discussed in 
Section 9.4.3 below. 

We consider the following CDS. We take the notional to be one, so that percent- 
age loss given default and absolute loss given default are the same. The premium 
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payments are due at N points in time 0 < tı < --- < ty. If t > tg, the protec- 
tion buyer pays in tg a premium of size x*(t, — tk—-1), where x* denotes the swap 
spread. After default, no premium payments are made. If default occurs before the 
maturity date ty of the swap, the protection seller makes a default payment of size 6 
to the buyer at the default time t. In a standard CDS the protection buyer pays the 
protection seller at default the part of the premium which has accrued since the last 
regular premium payment date; here we ignore these accrued premium payments to 
simplify the exposition. 

Given the risk-neutral hazard rate y and a generic spread x, we now price the 
payments made by the protection buyer (the so-called premium payment leg of the 
swap) and the payments made by the protection seller (the default payment leg) 
separately. The price of the premium payment leg at t = 0 (the expected discounted 
value of the payments) is given by 


N i 
yprem (y: y2) = £°( Y exp ( — f r(u) au) — nw -)) 
0 
k=1 


N 
=x D> p00, u(t — tk-1)Q(T > th), (9.18) 
k=1 


which is easily computed using the fact that Q(t > tk) = exp(— iy y 2 (s) ds). The 
expected discounted value of the default payment leg equals 


yif (y2) = £2( exp ( == [ r(u) du) lieci). 
0 


Since t has density f; (t) = y2(t) exp(— fj y2(u) du), defining R(u) := r(w) + 


y2(u), we get 
tn t 
vty) = f exp (- f ris) ds) fe) 
0 0 


tn t 
= af y2(t) exp ( = / R(s) as) dt. (9.19) 
0 0 


According to market convention there are no payments when two parties enter into 
a CDS agreement. This implies that the fair CDS spread x* has to be chosen such 
that the value of the contract at t = 0 is equal to zero. Hence x* is defined by the 
relation VP"°™(x*; y2) = V4l(72), which is easily solved for x*. Obviously, x* 
depends on the intensity function y 2, as VP"®™ and V4 depend on y 2. Note that in 
our pricing argument we have neglected the possibility of default for the protection 
seller. More sophisticated CDS-pricing approaches would take this possibility into 
account (see Notes and Comments). 

Assume now that we observe spreads quoted in the market for one or more CDSs 
on the same reference entity. Under the martingale-modelling approach we have to 
calibrate our model to the available market information. In the context of our simple 
default model we hence have to determine the implied risk-neutral hazard rate yÊ, 
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which ensures that the fair CDS spreads implied by the model equal the spreads 
which are quoted in the market. 

We consider the following simple example: market information consists of the fair 
spread x* of one CDS with maturity ty; the risk-neutral hazard rate y Ê is constant, 
i.e. y2(t) = yp? for some y2 > 0. In this context, by (9.18) and (9.19), the implied 
risk-neutral hazard rate 7 satisfies the equation 


N 


> tn d 
x* YO po(O, te) te — te-1)e- 7" = 572 | po, DeF dt. (9.20) 
0 
k=l 


Now, the left-hand side of this equation (the value of the premium payments) is 
decreasing in y 2, whereas the right-hand side (the value of the default payments) is 
increasing in y 2. Therefore, in this example there is a unique implied risk-neutral 
hazard rate, which is easily computed numerically. 

If we observe spreads for several CDSs on the same reference entity but with 
different maturities, a time-independent risk-neutral hazard rate is generally not 
sufficient to calibrate the model to the observed swap spreads. Instead one typically 
uses hazard-rate functions y Ê (t) that are piecewise constant or piecewise linear. An 
exception occurs in the special case where the spread curve is flat (i.e. all CDSs on 
the reference entity have the same spread x*, independent of the maturity), where 
the risk-free interest rate is constant and where the time points tg are equally spaced 
(i.e. tk — tk-1 = At for all t). In that case the implied risk-neutral hazard rate y Q is 
given as the solution to the following equation (equation (9.20) for the case N = 1): 


3 At 
x*Atpo(0, Ane?" = ape f po(0, t) exp(—y 2t) dt. (9.21) 
0 


For At relatively small (quarterly or semiannual spread payments) a good approxi- 
mation to the solution of (9.21) is given by y2 ~ x*/6, i.e. by the ratio of the fair 
swap spread and the percentage loss given default. This approximation is frequently 
used in practice. 


Remark 9.19. Recall that in reduced-form models historical default information 
is of no use in determining the form of the risk-neutral hazard rate. For our simple 
example it follows that the hypothesis of a constant risk-neutral hazard rate cannot 
be tested by looking at historical default patterns: it is perfectly possible that under 
the historical measure P default intensities are time dependent, but that under a 
risk-neutral measure Q they are constant, and vice versa. The only way of testing 
the assumption that y £ (t) is a constant or has any other functional form would be to 
test the implications of this assumption on the dynamics of observable CDS spreads 
or prices of other traded credit-risky securities. 


9.3.4 The Actuarial Approach to Credit Risk Pricing 


The actuarial approach is mainly used for the pricing of loans and related products, 
which are relatively illiquid. Under the actuarial approach the total spread a loan 
should earn (the difference between the interest rate which should be charged for 
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the loan and the interest rate on a default-free bond with similar characteristics) is 
computed according to the following schematic formula: 


total spread = administrative cost + expected loss + risk premium. (9.22) 


Administrative cost is of no concern to us here. Expected loss refers to expected 
loss under the physical probability measure and, assuming independence between 
default and recovery rates, it is given by the product of the annual default probability 
and the expected percentage loss given default. The determination of an appropriate 
risk premium is more involved from a methodological viewpoint and is discussed 
below. Formula (9.22) for the total spread has the same structure as the standard 
actuarial premium principles, hence the name actuarial approach to pricing credit 
risk. Of course, in practice a formal loan-pricing rule of the form (9.22) is not applied 
rigorously across the board; other factors, such as competitive pressure from other 
lenders or a long-standing business relationship between borrower and lender, play 
an important role in determining the yield spread for a loan. 


Risk premiums and economic capital. In modern loan-pricing systems the risk 
premium of a loan is computed by applying a target interest rate or hurdle rate to the 
economic capital required as buffer against losses from the deal. Hurdle rates are set 
by management; they reflect the return on equity aspired to by a financial institution. 
Under a modern economic capital framework, the economic capital required for a 
particular loan is determined in two steps. First, the economic capital of the entire 
credit portfolio is determined. Here the financial industry typically distinguishes 
between the so-called expected loss, given by the expected value E(L), and the so- 
called unexpected loss, given by UL := L — E(L). Usually, the economic capital 
for the entire loan portfolio is determined by applying a risk measure such as VaR 
or expected shortfall to UL, whereas the expected loss is charged directly to the 
borrower according to the general actuarial loan-pricing formula (9.22). 

In a second step the total economic capital needs to be allocated to the individual 
loans in the portfolio, a process called economic capital allocation. A fair economic 
capital allocation has to reflect the contributions of the individual loans to the total 
risk of the portfolio. For instance, a large loan, which is (almost) independent of the 
overall portfolio, might contribute less to total risk than a smaller loan, which is likely 
to default in circumstances where the portfolio produces large losses. Formally, 
economic capital allocation is done using a capital allocation principle such as 
standard deviation contributions or expected shortfall contributions (see Section 6.3 
for details). 


Financial and actuarial pricing compared. We conclude this section with a brief 
comparison of the two pricing methodologies. The financial-pricing approach is a 
relative pricing theory, which explains prices of credit products in terms of observ- 
able prices of other securities. If properly applied, it leads to arbitrage-free prices of 
credit-risky securities, which are consistent with prices quoted in the market. These 
features make the financial-pricing approach the method of choice in an environment 
where credit risk is actively traded and, in particular, for valuing credit instruments 
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when the market for related products is relatively liquid. On the other hand, since 
financial-pricing models have to be calibrated to prices of traded credit instruments, 
they are difficult to apply when we lack sufficient market information. Moreover, in 
such cases prices quoted using an ad hoc choice of some risk-neutral measure are 
more or less “plucked out of thin air”. 

The actuarial pricing approach is an absolute pricing approach, based on the 
paradigm of risk bearing: a credit product such as a loan is taken on the balance 
sheet if the spread earned on the loan is deemed by the lender to be a sufficient 
compensation for the risk contribution of the loan to the total risk of the lending 
portfolio. Moreover, the approach relies mainly on historical default information. 
Therefore, the actuarial approach is well suited to situations where the market for 
related credit instruments is relatively illiquid, such that little or no price information 
is available; loans to medium or small businesses are a prime case in point. On the 
other hand, the approach does not necessarily lead to prices that are consistent (in 
the sense of absence of arbitrage) across products or that are compatible with quoted 
market prices for credit instruments, so it is less suitable for a trading environment. 

As markets for credit products become more and more liquid, the financial- 
valuation paradigm is gaining in importance. This transition poses a challenge for 
risk management in financial institutions, since it may well be that a particular 
credit risk is priced differently by different parts of an institution, such as the loan 
department and a trading desk for credit derivatives. It is the task of a sound risk- 
management process to ensure that these inconsistencies are kept to a minimum; a 
profound understanding of the differences between financial and actuarial valuation 
is an important prerequisite for that. 


Notes and Comments 


Theoretical results on the relationship between physical and risk-neutral default 
probabilities were obtained by Artzner and Delbaen (1995), Jarrow, Lando and Yu 
(2005), Giesecke and Goldberg (2005) and others. General mathematical results 
for the behaviour of point processes (such as default indicators) under a change 
of measure (Girsanov-type theorems) can be found in Brémaud (1981) or Jacod 
and Shiryaev (1987). Empirical studies on the relationship between actual and risk- 
neutral default probabilities include Fons (1994), Bohn (2000), Driessen (2005), 
Huang and Huang (2003) and Berndt et al. (2004). In their paper, Berndt et al. go 
beyond the regression analysis presented in our text and estimate a full time series 
model for the joint evolution of risk-neutral and actual default intensities. 

The fundamental theorems of asset pricing are discussed in most textbooks on 
mathematical finance (see, for example, Bingham and Kiesel 1998; Duffie 2001; 
Shreve 2004). In recent years a number of interesting approaches to the risk manage- 
ment of derivative securities in incomplete markets have been developed. Quadratic 
hedging approaches were first developed by Follmer and Sondermann (1986) and 
Follmer and Schweizer (1991); Schweizer (2001b) is an excellent survey. The the- 
ory of superhedging is developed in El Karoui and Quenez (1995) and Kramkov 
(1996); the related idea of quantile hedging is explored in Follmer and Leukert 
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(1999). Utility-based approaches to pricing and hedging in incomplete markets are 
discussed in Delbaen et al. (2002) and Becherer (2004); the latter paper explicitly 
considers applications of utility-based hedging strategies to credit risk models. A 
discussion of incomplete-market models in discrete time can be found in Follmer 
and Schied (2004). Early papers dealing with market incompleteness in credit risk 
models include Becherer (2004), Becherer and Schweizer (2005) and Bielecki, Jean- 
blanc and Rutkowski (2004). 

The term martingale modelling was coined by Bjork (1998) in the context of 
default-free short-rate models (see also Baxter and Rennie 1996). The pricing of 
CDSs is discussed in most standard textbooks on credit risk models (see, for example, 
Bielecki and Rutkowski 2002; Bluhm, Overbeck and Wagner 2002; Duffie and 
Singleton 2003; Lando 2004; Schonbucher 2003). For results on CDS pricing with 
possible default of protection seller see, for example, Hull and White (2001). 

The relationship between actuarial and financial-pricing approaches is discussed 
by Jensen and Nielsen (1996), Embrechts (2000), Schweizer (2001a) and Embrechts, 
Frey and Furrer (2001), among others. 


9.4 Pricing with Doubly Stochastic Default Times 


The main result of this section concerns the pricing of two types of contingent 
claims that can be used as building blocks for constructing the pay-off of many 
important credit-risky securities. We will show that, for a default time which is 
doubly stochastic, the computation of prices for these claims can be reduced to a 
pricing problem for a corresponding default-free claim if we adjust the interest rate 
and replace the default-free interest rate r; by the sum R; = 7; + y, of the default- 
free interest rate and the hazard rate of the default time. 


9.4.1 Recovery Payments of Corporate Bonds 


To clarify the form of our building blocks we briefly survey different models for the 
recovery of corporate zero-coupon bonds. As in previous sections, we denote the 
price at time ¢ of a corporate zero-coupon bond with maturity T > t by p(t, T); 
po(t, T) denotes the price of the corresponding default-free zero-coupon bond. The 
face value of these bonds is always taken to be one. The following three recovery 
models are frequently used in the literature. 


Recovery of Treasury (RT). The RT model was proposed by Jarrow and Turn- 
bull (1995). Under RT, if default occurs at some point in time t < T, the 
owner of the defaulted bond receives (1 — ôr) units of the default-free zero- 
coupon bond po(-, T) at time t, where the rv 6; models the loss given default. 
At maturity T the holder of the defaultable bond therefore receives the payment 
pı(T, T) = Ines) + (1 — 62) tr <7}. In particular, if 6; = ô for some constant 
ô € (0, 1), we get pi (T, T) = (1 — ô) + ô lir>T}, so the price of the corporate bond 
at time tf < T equals the sum of (1 — 5)po(t, T) and ô times the price of the 
claim Ij;57}. 
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Recovery of Face Value (RF). Under RF, if default occurs att < T, the holder of the 
bond receives a (possibly random) recovery payment of size (1 — ô+) immediately 
at the default time t. The value at maturity T is therefore given by 


=a 2 


T,T)=I1 — . 
PAT, T) = It>T} + nET) <T) 


Even with deterministic loss given default 6, = 6 and deterministic interest rates, 
the value at maturity of the recovery payment is random as it depends on the exact 
timing of default. This makes the pricing of recovery payments under RF more 
difficult than under RT. 


Recovery of Market Value(RM). This recovery assumption has been put forward by 
Duffie and Singleton (1999a); its main virtue is the fact that it leads to particularly 
simple pricing formulas for corporate bonds. Under RM it is assumed that the 
recovery payment at the default-time t < T is given by a fraction (1 — 6,) of the pre- 
default value of the bond. Obviously, this is a recursive definition, as the pre-default 
value depends on the recovery payment. Nonetheless, under some assumptions it is 
possible to obtain a unique price for corporate bonds, where recovery is modelled 
using the RM assumption (see Proposition 9.24 below). 

In real markets recovery is a complex issue with many legal and institutional 
features, and all three recovery models are at best a crude approximation of reality. 
The RF-assumption comes closest to legal practice, as debt with the same seniority 
is assigned the same (fractional) recovery, independent of the maturity. On the other 
hand, for “extreme” parameter values (long maturities and high risk-free interest 
rate), RF may lead to negative credit spreads, as we will see in Section 9.5.3 below. 
Moreover, the RF model leads to slightly more involved pricing formulas for cor- 
porate bonds than the RM and RT models. Empirical evidence on recovery rates for 
loans and corporate bonds is discussed in Section 8.4.6. 


9.4.2 The Model 


We consider a firm whose default time is given by a doubly stochastic random time 
as in Section 9.2.3. The economic background filtration represents the information 
generated by an arbitrage-free and complete model for non-defaultable security 
prices. More precisely, let (2, F , (F;), Q) denote a filtered probability space, where 
Q is already the equivalent martingale measure. Prices of default-free securities such 
as default-free bonds are (¥;)-adapted processes. By (7;) we denote the default-free 
rate of interest; B; = exp( i rs ds) models the default-free savings account. 

Let t be the default time of some company under consideration and let Y; = Irt} 
be the associated default indicator process. As in Section 9.2.2 we set H, = o ({Ys : 
s <t}) and 9: = F; V Ff; we assume that default is observable and that investors 
have access to the information contained in the background filtration (F;), so that 
the information available to investors at time f is given by 9. We consider a market 
for credit products which is liquid enough that we may use the martingale-modelling 
approach, and we use Q as the pricing measure for defaultable securities. Finally, 
we assume that, under Q, the default time t is a doubly stochastic random time with 
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background filtration (¥;) and hazard-rate process (jy). This latter assumption is 
crucial for the results that follow. 


9.4.3 Pricing Formulas 
Definition 9.20. We introduce the following building blocks. 


(i) A vulnerable claim, i.e. an Fr-measurable promised payment X which is 
made at time T if there is no default; the actual payment of the vulnerable 
claim equals X I;;57}. 


Gi) A recovery payment at the time of default of the form Z;/{,<7}, where 
Z = (Z;)1>0 is an (F;)-adapted stochastic process and where Zz is short 
for Z;,(q)(@). T is the maturity of the recovery payment. 


Example 9.21 (corporate bonds). The actual payments of a corporate zero-coupon 
bond can be represented as a combination of a vulnerable claim and a recovery 
payment. Suppose that the loss given default is given by some (F;)-adapted process 
(ô+) with values in (0, 1). Under the RT hypothesis the actual payments are given by 
the vulnerable claim /{r>r} and the recovery payment (1 — 5,)po(t, T){r<7}. In 
the case where ô+ = ô for some ô € (0, 1), the pay-off at maturity simplifies further 
to the sum of the deterministic payment (1 — ô) and the vulnerable claim 6/;;57}. 

Under RF the actual payment of the bond consists of the vulnerable claim 1,57} 
and the recovery payment (1 — 6;)/{;<7}. Obviously, since coupon-paying cor- 
porate bonds can be represented as a portfolio of zero-coupon bonds issued by a 
corporation, coupon-paying bonds can also be constructed from building blocks (1) 
and (ii). However, see Remark 9.25 for a word of warning on the validity of linear 
pricing rules in reduced-form models. 


Example 9.22 (vulnerable option). Consider a call option with exercise price K 
and maturity T on some default-free security (S+), and assume that the writer of the 
option may default. Assume that in case of default of the writer at time t < T the 
owner of the option receives a fraction (1 — ô+) of the intrinsic value of the option at 
the time of default. This can be modelled as a combination of the vulnerable claim 
(Sr — K)* Its 1; and the recovery payment (1 — 6,)(S; — K)*Ir<7}. 


Credit default swaps can also be viewed as a combination of vulnerable claims 
and a recovery payment (see Section 9.4.4 below). 

According to (9.17), we obtain the following formula for the price at time ¢ of an 
arbitrary, non-negative, %7-measurable contingent claim H: 


T 
H, = £2(exp(- f rs as) H | a). (9.23) 
t 


Consider a default-free claim with Fr-measurable pay-off X. Since t is a doubly 
stochastic random time, the additional information about the default history con- 
tained in (%;) is of no use in computing the conditional expectation (9.23), and we 
have 


£2(exp(— ras) x | s) = (exp (— f ras)x | 5) (9.24) 
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A formal proof of this equality can be based on the representation of t obtained 
in Lemma 9.13; we omit the details. Relation (9.24) shows that we may write the 
price of a non-negative default-free claim X as X; = E 2(exp(— fr rs ds)X | Fi), 
which is obviously an (¥;)-adapted process. In particular, it does not matter if we 
model default-free security prices using (F;) or the larger filtration (G,). In the 
following theorem we show that, in a similar vein, the pricing of the building blocks 
introduced in Definition 9.20 can be reduced to a pricing problem in a default-free 
security market model with adjusted default-free interest rate. 


Theorem 9.23. Suppose that, under Q, t is doubly stochastic with background 
filtration (¥;) and hazard-rate process (jy). Define Rs := rs + ys. Assume that the 
rvs exp(— i rs ds)|X| and fia |Zsys|exp(— f? Ry du) ds are all integrable with 
respect to Q. Then the following identities hold: 


T 

£2(exp(- f ls as) eon | 4) 

t 

T 
= tron B® (exo (- f Rs as) x | s), (9.25) 
t 

T 

£2( tems exo (- f re as) Zlec s) 
t 
T s 
= ton Bf ZsVs exo (- f Ry au) ds 
t $ 


Proof. The integrability conditions ensure that all conditional expectations are well 
defined. We start with the pricing formula (9.25) for the vulnerable claim. Define the 
Fr-measurable rv X := exp(— JE rs ds)X. We obtain, using Corollary 9.10 with 
s = T and D, = h ys ds, that 


Fi). (9.26) 


EL (X Iu>T) | $1) = Irs EL (exp(—Ur — TŽ | F;). 


Using the relation Ir — I; = fr ys ds and the definition of X , we immediately 
obtain that the right-hand side equals r>} E 2 (exp(— JE Rs ds) X | Fi). Next we 
turn to (9.26). We obtain from Lemma 9.9 that 


T 
£2(teanexo(- f reds )Zeltecr) s) 
t 


E? (lrs exp(— fi" rs ds)Zr Isr) | Fr) 
P(t >t | F) ` 


Sirsi (9.27) 


Now note that 


t 
Pir <t| $n) =1-exp( - f nas), 
0 
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so the conditional density of t given Fr equals frifr (t) = y; exp(— fj Ys ds). 


Hence 
Fr) 


T 
£2( tenn exe( — f reds )Zelte<r) 
t 
T s s 
f exp (- f rv) Zersexp (— f yudu) ds 
t t 0 
t T sS 
exp (- f ndu) f zne(- f R, du) ds. 
0 t t 


Hence we obtain, using iterated conditional expectations, that 


T 
£2(tteon exp( - f reds )Zelieen F.) 
t 
t T Ss 
=ex(- f rau) £°( f z.nexp (— f Ry du ) ds 
0 t t 


the identity (9.26) follows because of (9.27). 


Fi): 


9.4.4 Applications 


Credit default swaps. We extend our analysis of Section 9.3.3 and discuss the 
pricing of CDSs in the model introduced in Section 9.4.2. This allows us to incor- 
porate stochastic interest rates, recovery rates and hazard rates into our analysis. 
As in our previous analysis, the premium payments are due at N points in time 
0 < tı <--- < ty; ata pre-default date tg, the protection buyer pays a premium of 
size x(t; — th_-1), where x denotes the swap spread in percentage points (again we 
take the nominal of the swap to be one). Moreover, if tT < ty, there is an accrued 
premium payment of size x(t — tķ—1), provided that tz—1 < T <S tk. If tT < ty, the 
protection seller makes the default payment of size 6, to the buyer at the default 
time T, where the percentage loss given default is now a general (F;)-adapted pro- 
cess. Using Theorem 9.23, both legs of the swap can be priced. The regular premium 
payments constitute a sequence of vulnerable claims. Using (9.25), we obtain, for 
the fair price in t = 0, 


N Pa 
yrrem, 1 y £e( exp (- f ry du) (0, = thiy<ni) 
k=1 0 


N tk 
=F =n £2( exp (- f Ru au)). 
k=1 0 


The accrued premium payments constitute a recovery payment, where Z is given 
by Z; =x EN — tei) (n_ 1) <s<4yy3 by (9.26) the fair price in t = 0 is given by 


N tk s 
paa £0 f (s — th_-1) Vs exp(- f Ry au) as). 
th-1 0 


k=1 
The default payments also form a recovery payment, this time with Z, = 6, and 
maturity ty, so their value is given by V4! = E2(f5% ôs Ys EXP(— f R, du) ds). 
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Hence we have reduced the pricing of credit default swaps to a pricing problem in 
the default-free world. Methods for solving this problem will be discussed in the 
next section. 


Recovery of market value. Next we turn to the pricing of credit-risky securi- 
ties whose recovery payment is described by the RM assumption introduced in 
Section 9.4.1. More precisely, we consider a claim whose pay-off consists of the 
vulnerable claim X and a recovery-payment of size (1 — 6,)V;J{r<7}, where the 
(F;,)-adapted process (ô+) € (0, 1) gives the percentage loss given default of the 
claim and where the (F;)-adapted process (V;) gives the pre-default value of the 
claim. Note that this is a recursive definition, as the pre-default value at time t also 
depends on the form of the future recovery payments in the time-period (f, T]. 
Nonetheless, we have the following result. 


Proposition 9.24. Suppose that, under Q, t is doubly stochastic with hazard-rate 
process (y;), that X is integrable, and that the RM assumption holds. Then the 
pre-default value process (V;) is uniquely determined and is given by 


T 
vs = £2 exp( - f re Bayes) X |a) OL FT. (9.28) 
t 


Note that for 6, = 1 the claim is a standard vulnerable claim; in that case, 
(9.28) reduces to the formula (9.25). On the other hand, for ôs = O the claim is 
essentially default free; in that case, (9.28) reduces to the standard pricing formula 
for the claim X in a default-free security market model. For a formal proof of 
Proposition 9.24 we refer to the references given in Notes and Comments. 


Credit spreads and hazard rates. With doubly stochastic default times the risk- 
neutral hazard-rate process (y+) and the credit spread 
1 
T-t 
of defaultable bonds are closely related. Analytic results are most easily derived for 
the instantaneous credit spread given by 


c(t, T)=- (In pi(t, T) — ln po(t, T)) 


f] 
c(t, t) = lim c(t, T) = — — (ln pıt, T) — ln pott, T)). (9.29) 
T>t oT T=t 
Assume that t > t, so that pı (t, t) = po(t, t) = 1. Hence we get 
f] 0 
— ln pı(t, T) = — 
OT |r OT |r- 


and similarly for po(t, T). To compute the derivative in (9.30) we need to distinguish 
between the different recovery models. Under RM, we have, from Proposition 9.24, 
exchanging expectation and differentiation, 


a Ti 
exp ( = f rs + ôs Ys as) | ri) 
T T=t t 


pıt, T) = -eo( $ i 


pilt, T), (9.30) 
t 


T=t 
=r + ôr yi. (9.31) 
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Applying (9.31) with 6, = 0 yields 


-2 aa aN T) =r, 
so that c(t, t) = ô+ y,,1.e. the instantaneous credit spread equals the product of hazard 
rate and percentage loss given default, which is quite intuitive from an economic 
point of view. Under RF, pi (tf, T) is given by the sum of the price of the vulnerable 
claim /{r>r} and the recovery payment (1 — 5),. Relation (9.31) with 6, = 0 shows 
that the derivative with respect to T of the vulnerable claim at T = t is equal to 
— (r; + y%). For the recovery payment we get 


T s 
e( f na-aep(- f Ry au) ds 
T=t t t 


Pilt, T) =r; + yi — (1 — 8t) yi = rt + êrYr, 
T=t 


ð 
oT 


wi) = (1 — 6). 


Hence 


aT 


so that cı (t, t) is again equal to ô; yr. An analogous computation shows that we also 
have cı (t,t) = ôy; under RT. However, for T — t > O the credit spread corres- 
ponding to the different recovery models differs, as is illustrated in Section 9.5.3 
below. 


Remark 9.25 (limitations of reduced-form models). The analogy with default- 
free term structure models makes the reduced-form models with doubly stochastic 
default times relatively easy to apply. However, in interpreting the results some care 
is required. In particular, one must bear in mind that in these models the default 
intensity does not explicitly take into account the structure of a firm’s outstanding 
risky debt. This can lead to nonsensical results, as is illustrated in the following 
simple example. Consider a firm whose risky debt consists of a single bond with 
maturity 7. Suppose that the firm raises new funds by issuing another zero-coupon 
bond with maturity T < T. In order to value this new debt, in a naive application of 
the reduced-form approach, one would set up a model with doubly stochastic default 
time t and calibrate the risk-neutral hazard rate to the price of the existing debt (the 
bond with maturity T). This model would then be used to price the zero-coupon 
bond with the short maturity. If the face value of the new debt is large relative to the 
value of the firm, the price obtained in this way is out of line with economic reality. 
As an extreme case, suppose that the firm uses the funds raised by the bond issue to 
buy back a some of its own shares. Clearly, this makes the firm riskier and raises the 
probability that the firm defaults on the new issue. This should be reflected in the 
default intensity and, in fact, since T < T, in the price of the existing debt. More 
generally, these considerations show that the validity of the linear pricing rules for 
corporate debt implied by the reduced-form approach must be interpreted with care. 
A formal analysis of these issues is best carried out in firm-value models, where 
the default is explicitly modelled in terms of fundamental economic quantities. An 
excellent discussion can be found in Chapter 2 of Lando (2004). 
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Notes and Comments 


The results of this section, and in particular Theorem 9.23, are originally due to 
Lando (1998); related results were obtained by Jarrow and Turnbull (1995) and 
Jarrow, Lando and Turnbull (1997). An alternative treatment at the textbook level 
(including a more detailed discussion of the RM recovery model) is given in Chap- 
ter 5 of Lando (2004). Proposition 9.24 is due to Duffie and Singleton (1999a); 
extensions are discussed in Becherer and Schweizer (2005). For a generalization of 
Theorem 9.23 to reduced-form models where the default time is not doubly stochas- 
tic we refer to Duffie, Schroder and Skiadas (1996) and Collin-Dufresne, Goldstein 
and Hugonnier (2004). 


9.5 Affine Models 


In order to apply the pricing formulas for doubly stochastic random times obtained 
in Theorem 9.23 we need effective ways to evaluate the conditional expectations on 
the right-hand side of equations (9.25) and (9.26). In most models, where default is 
modelled by a doubly stochastic random time, (r+) and (7) are modelled as functions 
of some p-dimensional Markovian state variable process (W;,) with state space given 
by the domain D C R?, so that R; := r; + yı is of the form R; = R(W%) for some 
function R : D C R? — R+, and thus the natural background filtration is given by 
(Fi) = o({W : s < t}). We hence have to compute conditional expectations of the 
form E(exp(— JE R(W,) ds)g(Wr) | F+) for generic g : D C RP > R+. Since 
(W,) is a Markov process, this conditional expectation is given by some function 
f(t, Y%) of time and current value W, of the state variable process. It is well known 
that the function f can be characterized in terms of a parabolic PDE—this is the 
celebrated Feynman—Kac formula. 

This yields an approach to determine f using analytical or numerical techniques 
for PDEs. In particular, it is known that in the case where (WY) belongs to the 
class of affine jump diffusions (see below), where R is an affine function and where 
e(w) = exp(u' y) for some u € R”, the function f is of the form 


f(t, Y) = exp(a(t, T) + Bit, TYY) (9.32) 


for deterministic functions w : [0, T] —> R and £ : [0, T] —> RP”; moreover, a and 
B are determined by a (p + 1)-dimensional ordinary differential equation (ODE) 
system that is easily solved numerically. A relationship of the form (9.32) is often 
termed an affine term structure, as it implies that continuously compounded yields 
of bonds at time ¢ are affine functions of W,. Because of the ease of implementation, 
most reduced-form models used in practice work with affine jump diffusions as state 
variable process. 

In this section we discuss these results. We concentrate on the case where the state 
variable process is given by a one-dimensional diffusion; extensions to processes 
with jumps will be considered briefly at the end. 
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9.5.1 Basic Results 


The PDE characterization of f. We assume that the state variable process (W%) is 
the unique solution of the SDE 


dW, = (W) dt +o(%)dW,, W=wWeD, (9.33) 


with state space given by the domain D C R. Here (W;) is a standard, one- 
dimensional Brownian motion on some filtered probability space (2, F, P, (F1)), 
and u and o are continuous functions from D to R, respectively R+. Consider func- 
tions R,g : D > R+. Since (Y%) is Markovian, given the present value ¥%, the 
future evolution (W;),>, of the state variable process is independent of F;, and we 
obtain 


T 
e(ex(- f R(Us) ds) e004) | 7) = f(t, %) (9.34) 
t 


for some function f : [0, T] x D —> R+. The next lemma gives the characterization 
of f in terms of a parabolic PDE announced above. 


Lemma 9.26 (Feynman-Kac). If f is once continuously differentiable in t and 
twice continuously differentiable in wy, it solves the terminal-value problem 


fit wb) fy + 30° W) fry = RF (t, W) € 10,7) x D, 
fT, y) =804), wed, 

where lower indices denote partial derivatives. Conversely, suppose that the function 

g is bounded, that R(Y) > 0 for all y € D, and that f : [0, T] x D —> R+ isa 


bounded solution of the terminal value problem (9.35). Let (W,) be a solution of the 
SDE (9.33). Then E (exp(— SE RW) ds)g(Wr) | F) = f(t, %). 


| (9.35) 


The Feynman—Kac formula is a standard result of stochastic calculus and it is 
discussed in many textbooks on stochastic processes and financial mathematics, so 
we omit the proof (references are given in Notes and Comments). 


Affine term structure. The following assumption ensures that the solution of the 
PDE (9.35), with terminal condition g(Y) = exp(uy), uw < 0 for y € D, is of 
the form (9.32), so that we have an affine term structure. Note that g = 1 for u = 0; 
this is the appropriate terminal condition for pricing zero-coupon bonds. 


Assumption 9.27. R, u and o? are affine functions of y, i.e. there are constants 
p°, p!, k?, k!, h? and h! such that R(W) = p? + po'y, uw) = k? +k! y and 
o? (Y) = h? + h'y. Moreover, for all Yy € D we have h? + h'w > 0 and 
po + piv È 0. 

Fix some T > 0. We try to find a solution of (9.35) of the form ft, y) = 
exp(a(t, T) + B(t,T)y) for continuously differentiable functions a(-,7) and 
B(.,T). As f (T, Y) = exp(uy), we immediately obtain the terminal condition 
a(T, T) = 0, B(T, T) = u. Denote by a'(-, T) and ’(- , T) the derivative of œ and 
B with respect to t. Using the special form of f we obtain that 


i= +B, fy=Bbf and fyjy=p f. 
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Hence, under Assumption 9.27 the PDE (9.35) takes the form 
(e+ BW) P+ +k WBF + A + hip f = + wwf. 
Dividing by f and rearranging we obtain 
a! + KB + SHB? — p? FO EEP p =N: 

Since this equation must hold for all y € D, we obtain the following ODE system: 

Bt, T) = p' —k' Ba, T) — 5h'pPa,T), B(T,T) =u, (9.36) 

a' (t, T) = p°? — k’ b(t, T) — £h? B° (t, T), a(T,T) =0. (9.37) 


The ODE (9.36) for (-, T) is a so-called Ricatti equation. While explicit solu- 
tions exist only in certain special cases, the ODE is easily solved numerically. The 
ODE (9.37) for a (- , T) can be solved by simple (numerical) integration once 6 has 
been determined. Summing up, we have the following proposition. 


Proposition 9.28. Suppose that Assumption 9.27 holds, that the ODE system 
(9.36), (9.37) has a unique solution (a, B) on [0, T] and that there is some C such 
that (t, T)W < C forallt € [0, T], Y € D. Then 


T 
(exp ( — / R(Us) ds) exp(uW7) | 7) = expla (t, T) + (t, T)%). 
t 


Proof. The result follows immediately from Lemma 9.26, as our assumption on 6 
implies that f(t, Y) = exp(a(t, T) + B(t, T)y) is bounded. 


9.5.2 The CIR Square-Root Diffusion 


A very popular affine model is the square-root diffusion model proposed by Cox, 
Ingersoll and Ross (1985) as a model for the short rate of interest. In this model (Y%) 
is given by the solution of the SDE 


dY, = k (8 — Y) dt +0oy Y, dW,, M=v>0, (9.38) 


for parameters x, 6,0 > 0 and state space D = [0, 00). Clearly, (9.38) is an affine 
model in the sense of Assumption 9.27; the parameters are given by k? = «Kð, 
k! = —«,h® = 0 and h! = 0°. 

It is well known that the SDE (9.38) admits a global solution (see Notes and 
Comments for a reference). This issue is non-trivial since the square-root function 
is not Lipschitz and since one has to ensure that the solution remains in D for all 
t > 0. Note that (9.38) implies that (WY) is a mean reverting process: if W, deviates 
from the mean-reversion level 0, the process is pulled back towards 6. Moreover, if 
the mean reversion is sufficiently strong relative to the volatility, trajectories never 
reach zero. More precisely, let to(W) := inf{t > 0: W, = O}. It is well known 
that for xô > lo? we have P(to(¥) < oo) = 0, whereas for KO < lo? we have 
P(tm(W) < œ)= 1. 
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In the CIR square-root model the Ricatti equations (9.36) and (9.37) can be solved 
explicitly. Using Proposition 9.28, one has 


T 
e(a(- f (0" + o'wa as) | z) = exp(a(T — t) + B(T — 1%), 
t 


with 


—2p! (e7 — 1) 
(t) = ; (9.39) 
y =K +e (y +K) 
5 2ye!/2 
P e ae ae ( PEEP Ni (9.40) 
o? y—k+et (y +x) 


and T := T — t, y := yK? + 20?p!. These formulas are the key to pricing bonds 
in models where the risk-free short rate and default intensities are affine functions 
of independent square-root processes, as is shown in the next example. 


Example 9.29 (a three-factor model). We now consider the pricing of zero-coupon 
bonds in a three-factor model similar to models that are frequently used in the 
literature. We assume that W = (W,,1, %2, W%.3)’ is a vector of three independent 
square-root diffusions with dynamics dW; = ki (6; — Wi) dt + Oi Dri dW, ; for 
independent Brownian motions (W;,;), i = 1, 2, 3. The risk-free short rate of interest 
is given by 7, = ro + %2 — %,1 for a constant ro > 0; the hazard rate of the 
counterparty under consideration is given by y, = 71%, + %,3 for some constant 
yı > 0. This parametrization allows for negative instantaneous correlation between 
(r;) and (yr), which is in line with empirical evidence. Note, however, that this 
negative correlation comes at the expense of possibly negative riskless interest rates. 
In this context the price of a default-free zero-coupon bond is given by 


T 
polt, T) = (exp (-f ls as) 7) 
t 
T T 
= OE exp (-f Ws,2 as) | 7, )e(exp (J Ys, as) | z), 
t t 


(9.41) 


where we have used the independence of (W%.1), (W%,2), (W%,3). Each of the terms 
in (9.41) can be evaluated using the above formulas for œ and 6 (equations (9.39) 
and (9.40)). Assuming that we have recovery of treasury in default (see Section 9.4.1) 
and a deterministic percentage loss given default 6, we obtain that the price of a 
defaultable zero-coupon bond is given by 


T 
pıt, T) = -dpt + 36 ( exp(— f (r+ 294s) | Fi). 
t 


By definition of r; and y the last term on the right-hand side equals 


T 
sE( exp ( = i ro + mi = DY, 1 + W2 + Ys,3 as) ri), 
t 


which can be evaluated in a similar way to the evaluation of expression (9.41). In the 


next section we will show how one deals with more complicated recovery models, 
such as recovery of face value. 
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9.5.3 Extensions 


A jump-diffusion model for (Y%). We briefly discuss an extension of the basic 
model (9.33), where the economic factor process (Y%) follows a diffusion with jumps. 
Adding jumps to the dynamics of (WY) provides more flexibility for modelling default 
correlations in models with conditionally independent defaults (see Section 9.6.3 
below). 

In this section we assume that (YW) is the unique solution of the SDE 


dv, = (W) dt to(W,) dW, +dZ,, W=weD. (9.42) 


Here (Z;) is a pure jump process whose jump intensity at time t is equal to 17 (WY) 
for some function A“ : D —> R4 and whose jump-size distribution has df v on R. 
Intuitively this means that given the trajectory (Y(@));>0 of the factor process, 
(Z;) jumps at the jump times of an inhomogeneous Poisson process (see Sec- 
tion 10.2.7) with time-varying intensity AŽ (t, W); the size of the jumps has df v. 
Suppose now that Assumption 9.27 holds, and that AŽ (y) = 1+1! y for constants 
1°, 1! such that AZ (y) > 0 for all w € D. In that case we say that (WY) follows 
an affine jump diffusion. For x € R denote by Î(x) = fg e™™ dv(y) € (0, œo] 
the extended Laplace-Stieltjes transform of v (with domain R instead of the usual 
domain [0, oo)). Consider the following extension of the ODE system (9.36), (9.37): 


B'(t, T) = p! — k' Bt, T) — 4h! p’, T) — (Bt, T- 1), (9.43) 
a(t, T) = p? — kK’ BC, T) — £h? B°, T) — OCL, T) — 1), (9.44) 


with terminal condition (T, T) = u for some u < 0 and a (T, T) = 0. Suppose 
that the system (9.44), (9.43) has a unique solution œ, 6 and that (t, T)ẹY < C 
for all t € [0,T], Yy € D (for 1° or I! # O this implicitly implies that 
d(—B(t, T)) < œ for all t). Define ft, y) = exp(a(t, T) + (t, T)W). Using 
similar arguments to those above, it can then be shown that the conditional expec- 
tation E (exp(— (ie R(W,) ds) exp(uW) | F;) equals ft, W,). 


Example 9.30 (the model of Duffie and Garleanu (2001)). The following jump- 
diffusion model has been used in the literature on CDO pricing. The dynamics of 
(W,) are given by 


dv, = k (8 —W,) dt +o. /% dw, + dZ, (9.45) 


for parameters «, ĝ, o > 0 and a jump process (Z;) with constant jump intensity 
1° > 0 and exponentially distributed jump sizes with parameter 1/u. Following 
Duffie and Garleanu, we will sometimes call the model (9.45) a basic affine jump 
diffusion. Note that these assumptions imply that the mean of v is equal to jz and that 
v has support [0, oo), so that (WY) has only upward jumps. Hence the existence of a 
solution to (9.45) follows from the existence of solutions in the pure diffusion case. 
It is relatively easy to show that for t —> oo we obtain E(W,) > 6 + IP u/x. For 
illustrative purposes we present the parameter values used in Duffie and Garleanu 
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Table 9.1. Parameters used in the model of Duffie and Gârleanu (2001). Recall that /° gives 
the intensity of jump in the factor process, jz gives the average jump size. With these parameters 
the average waiting time for a jump in the systematic factor process is 1/19 = 5 years. 
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Figure 9.8. (a) A typical trajectory of the basic affine jump diffusion model (9.45) and 
(b) the corresponding jumps of (Z+). The parameter values used are given in Table 9.1; the 
initial value Yọ is equal to the long-run mean 6 + (19 2) /« marked by the horizontal line. 


(2001) in Table 9.1; a typical trajectory of (Y%) is simulated in Figure 9.8. Next we 
compute the Laplace—Stieltjes transform ). We obtain for u > —1/, that 


9a / 1 
d(u) = e *(1/pwe 4 dx = : 


for u < —1/p we get b(u) = oo. We therefore have all the necessary ingredients 
to set up the Ricatti equations (9.44) and (9.43). In the case of the model (9.45) it 
is in fact possible to solve these equations explicitly (see, for example, Chapter 11 
of Duffie and Singleton (2003)). However, the explicit solution is given by a very 
lengthy expression, so we omit the details. 


Application to recovery payments. According to Theorem 9.23, in a model with a 
doubly stochastic default time t with risk-neutral hazard rate y (WY), the price in t 
of a recovery payment of size (1 — ô) at the default time t equals 


T Ss 
d—- be( f vw(- f ROU) du) ds 
t t 


where again R(Y) = r(Y) + y (Y). Using the Fubini Theorem this equals 


T s 
a -> f e(rwe(- f RW) du) | wi) ds. (9.47) 
t t 


Suppose now that y (y) = y? + y!w, that R(W) = o? + p!w and that (W) 
is given by an affine jump diffusion as introduced above. In that case the inner 


ri) ; (9.46) 
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expectation in (9.47) is given by a function F(t, s, Y%). This function can be com- 
puted using an extension of the basic affine methodology, so that (9.47) can be 
computed by one-dimensional numerical integration. Define for O < t < s the 
function f(t, s, Y) = expla (t, 5) + (t, s)W), where a(-, 5) and B(-, s) solve the 
ODEs (9.44), (9.43) with terminal condition a(s, s) = (s, s) = 0. Denote by 0’ (x) 
the derivative of the Laplace—Stieltjes transform of v. Then it is a straightforward 
application of standard calculus to show that, modulo some integrability conditions, 
Fit,s,w)= ft, s, W)(A(t, s) + B(t,s)Y), where A(-,s) and B(-,s) solve the 
following ODE system: 


B'(t, s) +k! Bit, s) + h! BB(t, s) — 1!5'(—8)B(t, s) = 0, (9.48) 
A' (t, s) + k°B(t, s) + h°BB(t, s) — 1°6'(—B) Bit, s) = 0, (9.49) 


with terminal condition A(s, s) = yo, B(s, s) = yı. Again, (9.48) and (9.49) are 
straightforward to evaluate numerically. 


Example 9.31 (defaultable zero-coupon bonds and CDS). We now have all the 
necessary ingredients to compute prices and credit spreads of defaultable zero- 
coupon bonds and CDS spreads in a model with a doubly stochastic default time 
with hazard rate y, = W, for a one-dimensional affine jump diffusion (YW). In Fig- 
ure 9.9 we plot the credit spread for defaultable bonds for the recovery assumptions 
discussed in Section 9.4.1. Note that, for T — t, i.e. for time to maturity close to 
zero, the spread tends to c(t, t) = ôW, > 0, as claimed in Section 9.4.3; in particular, 
the credit spread does not vanish as T — t. This is in stark contrast to firm-value 
models, where typically c(t, t) = 0, as was shown in Section 8.2.1. Note further 
that, for T — t large, under the RF assumption we obtain negative credit spreads, 
which is clearly unrealistic. These negative credit spreads are caused by the fact that 
under RF we obtain a payment of fixed size 1 — ô immediately at default. If the 
default-free interest rate r is relatively large, it may happen that 


T T 
£2(exp(- f Fs as) -dicn ) < ee(o(- f rs as) I<). 
0 0 


even if 5 > 0. This stems from the fact that on the right-hand side discounting is 
done over the whole period [0, T] (as opposed to [0, T]), so that discounting has a 
large impact on the value of the right-hand side, compensating the higher terminal 
pay-off. In Figure 9.10 we have plotted the fair spreads for CDS with and without 
accrued payments for varying maturities, assuming that the risk-neutral hazard rate 
follows a basic affine jump diffusion. 


Notes and Comments 


The Feynman—Kac formula is discussed, for example, in Section 4.5 of Bjork (1998), 
or, at a slightly more technical level, in Section 5.7 of Karatzas and Shreve (1988). 

Important original papers on affine models in term structure modelling are Duffie 
and Kan (1996) for diffusion models and Duffie, Pan and Singleton (2000) for jump 
diffusions. The latter paper also contains other applications of affine models, such 
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Figure 9.9. Spreads of corporate zero-coupon bonds in the Duffie-Garleanu model (9.45) 
for various recovery assumptions. The parameters of (W+) are given in Table 9.1; the initial 
value is Wọ ~ 0.0533. The risk-free interest rate and the loss given default are deterministic 
and are given by r = 6% and 6 = 0.5. Note that under the RT recovery model, the spread 
becomes negative for large times to maturity. 
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Figure 9.10. Fair CDS spreads in the Duffie-Garleanu model (9.45) for a CDS contract 
with semiannual premium payments and varying time to maturity. The parameters of (W+) 
are given in Table 9.1; the initial value is Wọ ~ 0.0533. The risk-free interest rate and the loss 
given default are deterministic and are given by r = 6% and 6 = 0.5. Note that, for small 
time to maturity, the fair swap spread is approximately equal to 5Wo © 2.7%. 


as the pricing of equity options under stochastic volatility and econometric issues 
related to affine models. It should be mentioned that there is also a converse to 
Proposition 9.28: if the conditional expectations EF (exp(— ie R(W,) ds)e””T | Fr) 
are all exponentially affine functions of Y%, the process (WY) is necessarily affine 
(see Duffie and Kan (1996) and in particular Duffie, Filipovic and Schachermayer 
(2003) for details). 

The mathematical properties of the CIR model are discussed in, for example, 
Chapter 6.2 of Lamberton and Lapeyre (1996), where the explicit solution (9.39) 
and (9.40) of the Ricatti equations in the CIR model is also derived. The model 
studied in Example 9.29 is akin to models proposed by Duffie and Singleton (1999a). 
Problems related to the modelling of negative correlation between state variable 
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process in an affine setting are discussed in Section 5.8 of Lando (2004). It is possible 
to compute the conditional expectation (9.46) for the price of a recovery payment 
as the solution of a parabolic PDE, which stems from the Feynman—Kac formula 
(see, for example, Lando 1998 for details). Empirical work on affine models for 
defaultable bonds includes the publications of Duffee (1999) and Driessen (2005). 


9.6 Conditionally Independent Defaults 


We begin our analysis of reduced-form models for portfolio credit risk with a brief 
overview of the existing model classes. 


9.6.1 Reduced-Form Models for Portfolio Credit Risk 


The simplest reduced-form models for portfolio credit risk are models with condi- 
tionally independent defaults. In this class default times are independent given the 
realization of some observable economic background process; hence these mod- 
els are an extension of the static Bernoulli mixture models of Section 8.4. More 
sophisticated models for dependent defaults include copula models and models 
with interacting intensities. Copula models have become popular in practice and we 
give an in-depth discussion of this model class in Sections 9.7 and 9.8; models with 
interacting intensities are discussed in Section 9.8.3. 

The common feature of the latter two model classes, as opposed to models with 
conditionally independent defaults, is the presence of default contagion and counter- 
party risk. Loosely speaking, this means that the conditional default probability of 
a non-defaulted firm jumps (usually upwards) given the additional information that 
some other firm has defaulted. As a consequence the credit spread of bonds issued 
by anon-defaulted firm increases given the news that some other firm has defaulted. 
Mathematically, default contagion is reflected in jumps in the martingale default 
intensity of non-defaulted firms at the default times of other firms in the portfolio. 
The impact of the default of some firm on the conditional default probability of other 
firms can arise via different channels. On the one hand it might be due to direct 
economic links between firms, such as a close business relationship or a strong 
borrower-lender relationship. For instance, the default probability of a corporate 
bank is likely to increase if one of its major borrowers defaults. This direct channel 
of default interaction is termed counterparty risk. On the other hand, changes in the 
conditional default probability of non-defaulted firms can be caused by information 
effects: investors might revise their estimate of the financial health of non-defaulted 
firms in light of the news that a particular firm has defaulted. In that case one speaks 
of information-based default contagion. 

A lot of recent research deals with the modelling of counterparty risk and default 
contagion for a number of reasons. First, there is substantial empirical evidence for 
interaction between default events. A recent example is provided by the downfall 
of the energy giant Enron in autumn 2001: the news that Enron had used illegal 
accounting practices led to rising credit spreads for many other corporations as 
bond investors lost confidence in the accounting statements of these corporations— 
a striking example of default contagion. Moreover, the stock price of major lenders 
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to Enron fell in anticipation of large losses on these loans, reflecting counterparty 
risk. Formal empirical evidence for interaction between default events is listed in 
Notes and Comments. A second reason for modelling default contagion is that 
this might help to explain the clustering of defaults around economic recessions 
observed in real data. However, this is not to say that models with conditionally 
independent defaults are of little interest. Yu (2005a) shows that the low default 
correlations in models with conditionally independent defaults may be related more 
closely to an unsatisfactory modelling of state variables than to a problem of the 
approach per se. We will discuss this issue in more detail in Section 9.6.3 below. 
On a related note, not every default of a major corporation leads to changes in the 
credit spreads of the remaining firms, so conditional independence is often realistic. 
Moreover, more sophisticated models of default dependence may be hard to calibrate 
in practice, particularly for large portfolios. In any case, the discussion of models 
with conditionally independent defaults will provide a methodological basis for 
studying more complicated models. 


Notation. In keeping with Section 9.1, we use the following notation for our analy- 
sis of dynamic portfolio credit risk models. We consider a portfolio of m obligors with 
default times t; and default indicator processes Y; ; = Y; (t) = Irt 1 <i <m, 
on some generic probability space (2, F, P), where the interpretation of P (phys- 
ical measure or equivalent martingale measure) will depend on the context. (Note 
that we switch freely between the notation X; ; and the notation X; (t) for generic 
processes defined at the level of individual obligors; generally we favour X; ; for 
stochastic processes and X; (t) for deterministic ones, but we also depart from this 
for reasons of notational elegance in individual formulas.) 

In dynamic portfolio credit risk models it is convenient to consider survival 
functions instead of distribution functions. As usual, F; (t) = P(t; > t) denotes 
the tail or survival function of obligor i, the joint survival function is denoted 
by F(t, .--5tm) = P(t) > t1,.--,; Tm > tm). Throughout our analysis we restrict 
ourselves to models without simultaneous defaults. We may therefore denote the 
ordered default times by To < Ti < --- < Tm, where To = 0 and, for 1 <n < m, 
T, = min{t; : ti > Ta-1, | <i < m}. By &n € {1, ..., m} we denote the identity 
of the firm defaulting at time 7;,. Finally, 

An ={1 <i <m: Yi(Ta) = 0} = {1, ..., m} \ (61, ---, En} 
is the set of non-defaulted firms immediately after time 7,,. 
As in the previous sections, (¥;) represents our background filtration, typically 


generated by some observable process (W,) representing economic factors. More- 
over, we introduce the filtrations {Hi}, 1 <i <m, (#,) and (ġr) by 

Hİ =0(Ysiis St), Hi= HİV- -V H” and G =F, V H. (9.50) 
(Hi } is the filtration generated by default observation for obligor i alone; (7€;) is 
the filtration generated by default observation for all obligors; (;) contains default 
information for all obligors and observable background information and thus rep- 


resents the information available to investors at time t. Often (J¢;) is called the 
internal filtration generated by the default times t;, 1 < i < m. 
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9.6.2 Conditionally Independent Default Times 


In this section we discuss general mathematical properties of models with condi- 
tionally independent defaults; specific examples from the literature are considered 
in the next section. We start with a formal definition of conditionally independent 
default times. 


Definition 9.32. Given a probability space (2, F, P) with background filtration 
(F;) and random times T],..., Tm, the t; are conditionally independent doubly 
stochastic random times if 


(i) each of the t; is a doubly stochastic random time in the sense of Defini- 
tion 9.11 with background filtration (F,) and (¥;)-conditional hazard-rate 
process (7;,;); and 


(ii) the rvs T,..., Tm are conditionally independent given Foo, i.e. we have, for 
all ti, ..., fm > 0, 
m 
POS ti, xj Tm S im | Foo) = | | PC: < ti | Foo). (9.51) 
i=1 


Construction and simulation via thresholds. The lemma that follows extends 
Lemma 9.12. 


Lemma 9.33. Let (%,1),.-.,(%,m) be positive, (F;)-adapted processes such 
that I; i := J Ys, i ds is strictly increasing and finite for any t > 0. Let E = 
(E1, ..., Em)’ be a vector of independent, standard exponentially distributed rvs 
independent of Fæ. Define ti by ti = T7 (E). Then t1, ..., Tm are conditionally 
independent doubly stochastic random times. 


Proof. According to Lemma 9.12, each of the t; is a doubly stochastic random time 
with (¥;)-hazard-rate process (yr, i). It remains to verify the conditional indepen- 
dence. Using the fact that t; < t <=> E; < T; i, we have 


P(t S t, ..., Tm S tm | Foo) = P(E1 S Diy. -+ Em S Lin.m | Foo) 


m 
= | [ P(E: < Fai | Foo) 
i=1 


m 
= I] P(t; < ti | Foo). (9.52) 
i=l 


Note that (9.52) holds since the rvs I}, ; are measurable with respect to Fao, whereas 
the E; are mutually independent and independent of Foo. 


Lemma 9.33 is the basis for the following simulation algorithm. 
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Algorithm 9.34 (multivariate threshold simulation). 


(1) Generate a trajectory of the hazard-rate processes (y i) fori = 1,..., m. Here 
the same techniques as in the univariate case can be used; note, however, 
that for a high-dimensional factor vector this step can become quite time- 
consuming. 


(2) Generate a vector E of independent standard exponentially distributed rvs 
(the threshold vector) and set t; = T7! (E), l<icm. 


As in the univariate case, Lemma 9.33 has a converse, which we state without the 
simple proof. 


Lemma 9.35. Let t1, ..., Tm be conditionally independent doubly stochastic ran- 
dom times with (Ff;)-hazard-rate processes (yri). Define a random vector E by 
E; = T; (ti), 1 <i < m. Then E is a vector of independent, standard exponentially 
distributed rvs that is independent of Foo, and T = Po (Ei) almost surely. 


Recursive default time simulation. We now describe a second recursive algorithm 
for simulating conditionally independent default times, which is sometimes more 
efficient than multivariate threshold simulation. Moreover, the algorithm generalizes 
naturally to reduced-form models with interacting intensities (see Section 9.8.3). We 
need the following lemma, which gives properties of the first default time 7). 


Lemma 9.36. Let 1, ..., Tm be conditionally independent doubly stochastic ran- 
dom times with hazard-rate processes (y;,1), . . . , (Vt,m). Then T; is a doubly stochas- 
tic random time with (¥;)-conditional hazard-rate process y; := baer Yri t > 0. 


Proof. Using the conditional independence of the t; we get 


m t 
P(Tl, >t | Foo) = P(t] >t,..., Tn >1| Fx) = [Tex (- f ysis), 
i=! 0 


which is obviously equal to exp(— J Ys ds). As this expression is ¥;-measurable, 
the result follows. 


Next we compute the conditional probability of the event {&; = i} given the first 
default time T; and full information about the background filtration. 


Proposition 9.37. Under the assumptions of Lemma 9.36 we have 
PEI =i | Fo vo(N) =v(Ti)/y(N), i € {1,...,m}. 


Proof. Conditional on Fo, the t; are independent with deterministic hazard rate 
yi (t), so it is enough to prove the proposition for independent random times with 
deterministic hazard rate. Fix some ft > 0 and note that the probability of having 
more than one default in the interval (t — h, t] is of order o(h), as the random vector 
(T1,.--, Tm) has a joint density. Hence 


PCa H=HYO(Ne C-h tp = P(r E€ t- h, tl} A {r >t, j #i}) + of) 


= P(e (t—h.t) | [Pa >» 
i#i 
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by the independence of the t;. Since P(t; > t) = exp(— Jo yi(s)ds), 1 <i < m, 
this equals 


t—h t 
exp (- f y(s)as) (1 exp (— f nas) 
0 t—h 


t 
x | [exp ( -f zoas) +o(h). 
0 


j#i 


Hence we get 


t 
lim h7' P(E) =i} {T € @- aay =nwen(- f 71s) as). 
h—>0+ 0 


Moreover, by Lemma 9.36, 
t 
lim h7-!P(T, € (t — h, t]) = y(t) exp ( — f y (s) as), 
h—>0+ 0 
so the claim follows from the definition of elementary conditional expectation and 
L Hôpital’s rule. 


Algorithm 9.38 (recursive default time simulation). This algorithm simulates a 
realization of the sequence (Ta, En) up to some maturity date T. Recall that for 
n > | the set of non-defaulted firms immediately after T„ is denoted by A, and 
set Ag := {1, ..., m}. Define y? := ae Yii, 0 < n < m. Then the algorithm 
proceeds in the following steps. 


(1) Generate a trajectory of the hazard-rate processes (j,i). 


(2) Generate Tı by standard univariate threshold simulation, using the fact that 
T; has hazard rate (7°) by Lemma 9.36. 


(3) Determine é; as arealization of anrv é with P(é = i) = yi (T1)/7°(T1) (using 
Proposition 9.37). 


(4) If Ti > T stop. Otherwise note that, for conditionally independent defaults, 


P(t; > Ti, j € A1, | Ti, 61, Foo) 


Ti+t 
= exp ( - Í A as). (9.53) 
1 


Generate the waiting time 7) — Tı via univariate threshold simulation using 
(9.53); determine & as before, using the fact that, for i € Aj, 


P(h -T >t | Ti, 1, Foo) = 


P(& =i | Ti, To, £1, Foo) = yi(T2)/y' (1). 


(5) Proceed in this way until T, > T for some n < m or until all firms have 
defaulted. 
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Recursive default time simulation is particularly efficient if we want to simulate 
only defaults occurring before some maturity date T and if defaults are rare. In that 
case, typically 7;,, > T already for n relatively small, so only a few ordered default 
times need to be simulated. With multivariate threshold simulation, on the other 
hand, we need to simulate the default times of all obligors in the portfolio. 


Martingale intensities. The following proposition shows that, for conditionally 
independent defaults, martingale intensities and hazard rates coincide. 


Proposition 9.39. Let t,,..., Tm be conditionally independent doubly stochas- 
tic random times with hazard-rate processes (y;,1),..-, (Yt,m). Then the process 
Mii := Yri — Py Ys, i ds is a (Qr)-martingale with (Qr) as in (9.50). 


Proof. We know from Proposition 9.15 that (M;,;) is a martingale with respect to 
the filtration {9i} with gi = F; v Hi, i.e. that E(Ms; | 9!) = Mii, s > t. 
However, this does not automatically imply that (M;,;) is also a (G;)-martingale, as 
Gİ C Gr, and so we could have E(M,. | $:) 4 E(Ms,i | Gi). In fact, this typically 
happens in copula models (see Section 9.8.1 below). In the present situation the 
conditional independence of the t; permits us to overcome this difficulty. This is quite 
intuitive: since the qt; are conditionally independent, default information for obligor 
j # i is of no use in predicting the default of obligor i. A formal argument is as 
follows. Using Lemma 9.35, we may assume that there is a vector E of independent, 
standard exponential rvs, independent of Foo, such that for all 1 < j < m we have 
t= PED: Obviously, t; is independent of E; for j 4 i, so 


E(Msi | $i Vo QE; : j FEY) = E(Ms;i | $i) = Mri. (9.54) 


On the other hand, if we know E; and the trajectory (Yu, j)o<u<t, we can determine 
Yy,j for O < u < t. Hence Gi = F; V Hl Ve V HP is a subset of 9! vo({Ej: 
j & i}), so (9.54) implies that E (Msi | $r) = Mri, as required. 


Remark 9.40 (pricing of single-name credit products). Suppose that T1, ..., Tm 
are conditionally independent doubly stochastic random times. Consider a single- 
name credit product with maturity T whose pay-off H depends only on the default 
history of firm i and on the evolution of default-free security prices and is thus Gi- 
measurable. A typical example is a vulnerable claim of the form H = Ir;>r}X for 
an ¥7-measurable rv X. A similar argument to that in the proof of Proposition 9.39 
shows that 


£2(exp(—[ nas) | si) = Eo(exp(- f ras) a | ar). t<T, 


where (7;) is the ¥;-adapted default-free short rate. Now, the left-hand side of the 
above equation gives the price of the claim H in a single-firm model where the 
information available to investors at time ¢ is given by gi, whereas the right-hand 
side gives the price of H in the portfolio model where at time ¢ investors have 
access to the larger information set %, containing default information on all firms 
in the portfolio. Hence pricing formulas for single-name credit products obtained 
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in a single-firm model with a doubly stochastic default time, such as the pricing 
formulas from Theorem 9.23, remain valid in a portfolio model with conditionally 
independent default times. If we go beyond conditional independence this is no 
longer true, as will be discussed in Section 9.8.1. 


9.6.3 Examples and Applications 


In most models with conditionally independent defaults, hazard rates are modelled 
as linear combinations of independent affine diffusions, possibly with jumps. A 
typical model is as follows: 


p 
Yi = yot) vy, +i, 1<i<m. (9.55) 
j=1 

Here wD), l1 < j < p, and (wis), 1 <i < m, are independent CIR square-root 
diffusions or, slightly more generally, basic affine jump diffusions as in (9.45); the 
factor weights y;;,0 < j < p, are non-negative constants. Obviously, we) rep- 
resents the common or systematic factors, whereas (wid) is an idiosyncratic factor 
process affecting only the hazard rate of obligor i. Note that the weight of the 
idiosyncratic factor can be incorporated into the parameters of the dynamics of 
(wit), so we do not need an extra factor weight. Throughout this section we assume 
that the background filtration is generated by (wy) and (wid), 1<is<m.In 
practical applications of the model, the current value of these processes is derived 
from observed prices of defaultable bonds. 

We now present a few examples proposed in the literature. Duffee (1999) has esti- 
mated a model of the form (9.55) with p = 2; in his model all factor processes are 
assumed to follow CIR square-root diffusions, so that their dynamics are character- 
ized by the parameter triplet (K, 0, 0). In Duffee’s model, we) represents factors 
driving the default-free short rate; the parameters of these processes are estimated 
from treasury data. The factor weights y;; and the parameters of (wit), on the other 
hand, are estimated from corporate bond-price data. 

In their influential case study on CDO pricing, Duffie and Garleanu (2001) use 
basic affine jump diffusion processes of the form (9.45) to model the factors driving 
the hazard rates. Jumps in (yr) represent shocks which increase the default proba- 
bility of a firm. They consider a homogeneous model with one systematic factor, 
Ley = ye + wi, 1 <i < m, and assume that the speed of mean-reversion x, 
the volatility o and the mean jump size u are identical for wy and (wid), It is 
straightforward to show that this implies that the sum y; ; = yo“ + wid follows a 
basic affine jump diffusion with parameters «, 05YS' + 8'4, ø , (J9)8¥st + (19) 4 and u; 
the parameters of (j;,;) used in Duffie and Gârleanu (2001) can be found in the row 
labelled “base case” in Table 9.2. 


Pricing single-name credit products. As discussed in Remark 9.40, with condition- 
ally independent defaults, pricing formulas obtained in a single-firm model remain 
valid in the portfolio context. Moreover, with hazard rates as in (9.55) most actual 
computations can be reduced to a one-dimensional problem involving affine pro- 
cesses, to which the results of Section 9.5 apply. As a simple specific example we 
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Table 9.2. Parameter sets of the Duffie-Garleanu model used in Figure 9.11. 


Parameter set K 6 o 10 H 
Pure diffusion 0.6 0.0505 0.141 0 0 
Base case 0.6 0.02 0.141 0.2 0.1 


High jump intensity 0.6 0.0018 0.141 0.32 0.1 


consider the computation of the conditional survival probability of obligor i. We 
obtain from Remark 9.40 and Theorem 9.23 that 


T 
Pti >T |9) = Pti >T |9) = tivo (exp (— f Ysi as) | z) 
t 


For hazard-rate processes of the form (9.55) this equals 


T . 
lase V7? E( exp( — f wis as) | Fi) 
t 
P T i 
sys 
x Te(e(-/ wv as) | z). (9.56) 
j=l 


Each of the conditional expectations in (9.56) can now be computed using the results 
on one-dimensional affine models from Section 9.5. More general models, where 
hazard rates are given by a general multivariate affine process (and not simply by a 
linear combination of independent one-dimensional affine processes) can be dealt 
with using the general affine-model technology from Duffie, Pan and Singleton 
(2000). 


Static version of the model. It is interesting to look at the implications of condi- 
tional independence and the factor structure (9.55) of the hazard rates for the distri- 
bution of the default indicators at a given point in time 7, as this links our analysis 
to the static models of Chapter 8. For simplicity we suppose that the idiosyncratic 
factor (WS) vanishes for all firms. Fix some T > 0 and consider the random vector 
Yr = (Yr1,---, YT m). We get, for y € {0, 1}, 


P(Yr = y) = E(P (Yr = y | Foo)) 
=£( [| Pæ <TIlF) [| Pæ>T] Fo); 
J:yj=1 j:yj=0 
Now we obviously have, using (9.55), 


P T 

P(t) ST | Foo) =1 exp ( vioT — > vij f was). (9.57) 
7 0 i 
j=l 


This shows that Yr follows a Bernoulli mixture model with p-factor structure as in 
Definition 8.10 with factor vector given by 


T syst T syst i 
v=( f VAM dese] Wep as) 


and conditional default probabilities p;(W) as in (9.57). 
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Default correlation. As we have seen in Chapter 8, default correlations (defined 
as correlation p(Y7,;, Yr,j), i # j, of the default indicators) are crucial for the 
tail of the credit loss distribution. In computing default correlations in models 
with conditionally independent defaults it is more convenient to work with the 
survival indicator 1 — Y7,;. By the definition of standard linear correlation we 
have 


P(Y7,i, Yr, j) = p(l — Yri, 1 — Yr,j) 
7 P(q >T, tj > T) — F(T) F(T) 
(F(T) — FTE; TA — Fj (T))) 1/2" 


(9.58) 


For models with hazard rates as in (9.55), the computation of the survival probabil- 
ities F;(T) using affine-model technology has been discussed above. For the joint 
survival probability we obtain, using conditional independence, 


Pj >T,tj >T)=E(P(tj > T,t; >T | Foo)) 
= E(P (ti >T | Foo) P(tj >T | Foo)) 


T 
= B( exp ( -f (Ysi + yd) ) (9.59) 


For hazard rates of the form (9.55), expression (9.59) can be decomposed in a similar 
way to the decomposition in (9.56) and can thus be evaluated using our results on 
one-dimensional affine models. 

It is often claimed that the default correlation values that can be attained in models 
with conditionally independent defaults are too low compared with empirical default 
correlations (see, for example, Hull and White 2001; Schonbucher and Schubert 
2001). Since default correlations do have a significant impact on the loss distribution 
generated by a model, we discuss this issue further. As a concrete example we use 
the Duffie—Garleanu model and assume that (wid) vanishes. As discussed above, in 
that case the default indicator vector Yr follows an exchangeable Bernoulli mixture 
model with mixing variable O given by | — exp(— ie ys syst ds). 

We have seen in Section 8.4.1 that in exchangeable Bernoulli mixture models 
every default correlation o € (0,1) can be obtained by choosing the variance 
of the mixing variable sufficiently high. It follows that in the Duffie—Garleanu 
model high levels of default correlation can be obtained if the variance of the rv 
Ir := i ye ds is sufficiently high. A high variance of Ir can be obtained by 
choosing a high value for the volatility o of the diffusion part of (ws) or by 
choosing a high value for the mean of the jump-size distribution u or for the jump 
intensity /°. A high value for o translates into very volatile day-to-day fluctuations 
of credit spreads, which might contradict the behaviour of real bond-price data. This 
shows that it might be difficult to generate very high levels of default correlation 
in models where hazard rates follow pure diffusion processes (see, however, Yu 
(2005a) for an alternative view). 

In the Duffie-Garleanu model we may alternatively raise the frequency or size 
of the jumps in the hazard rate by increasing /° or u. This is a very effective 
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Figure 9.11. (a) Default correlations for varying time to maturity and three different param- 
eter sets in the Duffie—Garleanu model for (wid) = 0 and different parameter sets for Cae 
The parameters of wey) are given in Table 9.2. We see that by increasing the intensity of 
jumps in (Z+) the default correlation is increased substantially. (b) The survival probabilities 
for the three parameter sets are essentially equal, so that the differences in default correlations 
are solely due to the impact of the dynamics of we) on the dependence structure of the 


default times. 


mechanism for generating default correlation, as is shown in Figure 9.11. In fact, 
this additional flexibility in modelling default correlations is an important moti- 
vation for considering affine jump diffusions instead of the simpler CIR diffusion 
models. 

These qualitative findings obviously carry over to other models with conditionally 
independent defaults. Summing up, we conclude that it is certainly possible to 
generate high levels of default correlation in models with conditionally independent 
defaults; however, the required models for the hazard-rate processes may become 
relatively complex. 


First-to-default swaps. As a final application we study the pricing of first-to- 
default swaps in models with conditionally independent defaults. We consider a 
portfolio of m firms. Premium payments on the swap are due at N points in time 
0<t <--:<ty =: T. Provided that Ti > tn, the premium at time t, is of the 
form x (tn — tn—1 ); at Tı premium payments stop. For simplicity we neglect accrued 
premium payments. The default payment occurs at time Tı provided T; < T. We 
assume that the payment depends on the identity £; of the first defaulting firm (per- 
haps because of differing exposure sizes) but is otherwise deterministic, i.e. there 
are constants l1, ...,lm such that the default payment is equal to l; if Ti < T 
and é = i. As usual, the fair spread x* of the swap is the value of x such 
that at t = O the default payment leg and premium payment leg have the same 
value. 

Since, in practice, first-to-default swaps are always priced relative to traded single- 
name CDSs, it is natural to adopt the martingale-modelling approach. We assume 
that under the equivalent martingale measure Q the default times t; are conditionally 
independent doubly stochastic random times with hazard rates of the form (9.55); 
moreover, the risk-free short rate (r+) is assumed to be of the form (9.55). In this 
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set-up, for generic swap spread x the value of the premium payment equals 


N tn 
ema: £2( exp ( 2 / Is as) Ierimn) xé — tn-1). (9.60) 
0 
n=1 


Using Theorem 9.23 and Lemma 9.36 we get 


tn tn m 
£2(ex(- f ae as) imom ) = £2(exo(- f +E nas)) 
9 o i=1 


For hazard rates and a risk-free short rate of the form (9.55) this can be expressed 
as a product of expectations of the form E 2(exp(—C i W, ds)) for a constant C 
and a one-dimensional affine jump diffusion (WY), so the premium payments can be 
computed using the methods developed in Section 9.5. Next we turn to the default 
payments. We have 


m Ti 

ydf — ` 1 £2( exp ( = f rs as) inertem) 
i 0 
i=l 


We begin by computing 


Tı 
£°(ex(- f Fs as Jinen tazi | Fo). 
0 


Conditioning on 7; we obtain that this equals 


T t 
f a(- f ris) ds) oG =i PRE AOL (9.61) 


where i F, (É) is the Q-density of Tı given Foo. By Lemma 9.36, 


t 


DRO) = zo|- f zas). 


Moreover, by Proposition 9.37, 


QG =i | Ti = t, Fo) = yi t)/y &). 


Hence (9.61) equals P yi (t) exp(— h (r(s) + ¥(s)) ds) dt. To compute the value 
of V4 we thus have to compute EL (JE Yı, i exp(— J rs + Ys ds) dt). For hazard 
rates of the form (9.55) this can be evaluated using the extended transform discussed 
in Section 9.5.3; we omit the details. If the default payments /; are all identical, the 
first-to-default swap can be priced like a single-name CDS, with the hazard rate of 
the default time given by (y); this follows immediately from Lemma 9.36. 

In certain special cases higher-order default swaps can be evaluated analytically. 
However, in most cases that are practically relevant, one has to use Monte Carlo 
simulation, and the recursive default time simulation algorithm from the previous 
section comes in handy. 
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Notes and Comments 


The empirical literature on default contagion and counterparty risk has two different 
strands. On the one hand, there are papers such as Collin-Dufresne, Goldstein and 
Helwege (2003) or Lang and Stulz (1992) which focus on the impact of defaults or 
credit-spread widenings of a given firm on the credit spreads or stock returns of other 
firms and hence on default contagion under a risk-neutral measure. For instance, 
Collin-Dufresne, Goldstein and Helwege (2003) found that, even after controlling 
for other macroeconomic variables influencing bond returns, the return of large 
corporate bond indices in months where one or several large firms experienced a 
significant (above 200 basis points) widening in credit spreads is significantly lower 
than the return of these indices in other months; this is clear evidence supporting 
contagion. Das, Duffie and Kapadia (2005) and Jarrow and Yu (2001), on the other 
hand, look at default contagion under the physical measure. Jarrow and Yu provide 
a lot of anecdotal evidence for counterparty risk in small portfolios. Das, Duffie 
and Kapadia formally test whether models with conditionally independent defaults 
driven by observable macroeconomic factors are sufficient to explain the degree of 
clustering one finds in actual default data for large portfolios (their default database 
contains approximately 2000 firms). In their words, they “do not find substantial 
evidence of default clustering beyond that predicted by the doubly stochastic model 
in their data”. These findings are only preliminary, but indicate nonetheless that 
default contagion is relevant for the pricing and the hedging of portfolio-related 
credit derivatives; for credit risk management issues, on the other hand, a model 
with conditionally independent defaults and appropriately specified factors might 
be sufficient. 

The results of Section 9.6.2 are well known; for an alternative treatment at textbook 
level, see, for example, Chapter 9 of Bielecki and Rutkowski (2002). The simulation 
of conditionally independent default times is discussed in Duffie and Singleton 
(1999b) (see also Duffie and Singleton 2003). Further empirical work on affine 
models for credit portfolios includes that of Duffee (1999) and Driessen (2005). 
Default correlations in models with conditionally independent defaults are discussed 
in Yu (2005a). 


9.7 Copula Models 


Copula models are widely used in practice for the pricing of basket credit derivatives 
and CDO structures. They are easy to calibrate to a given term structure of defaultable 
bonds or CDS spreads; moreover, they can be used to model default contagion. 
In this section we introduce theses models; particular attention will be given to 
models where the copula has a factor structure, since these models have a convenient 
representation as mixture models. Dynamic properties of copula models and default 
contagion are studied in Section 9.8 below. 


9.7.1 Definition and General Properties 


To motivate our definition of copula models we return briefly to models with 
conditionally independent defaults. According to Lemma 9.35, if T1, ..., Tm are 
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conditionally independent doubly stochastic random times with (F;)-adapted 
hazard-rate processes (¥,1), ---, (Y%,m), we can find a random vector E with inde- 
pendent, unit exponentially distributed components, independent of Foo, such that 
ti = inf{t > 0: I); > Ei}. We may rewrite this as 

ti = inf{t > 0: 1—exp(-—I;,;) = U; := 1 — exp(—£;)}. (9.62) 
Note that Ü = (U eee Uni) is a vector of m independent rvs with uniform margins, 
so that its joint df is the m-dimensional independence copula (see Section 5.1.2). 
In the copula models we generalize this construction and replace the independence 
copula with some other copula; obviously, this allows for a richer dependence struc- 
ture of the qt; than in the case of conditionally independent default times. Defining 
Ui :=1- U; we may rewrite (9.62) as t; = inf{t > 0: exp(—I;,;) < Ui}. To be 
in line with the literature we work with this description of the t; and define copula 
models in terms of the copula C of U (or equivalently the survival copula C of U 
introduced in Section 5.1.5). We call C the conditional survival copula of the firms; 
this terminology will be justified below. 


Definition 9.41 (copula model for default times). Let (%,i),..., (¥%,m) be non- 
negative, (¥;)-adapted processes such that I7,; < oo for all tf > 0, and let C be an 
m-dimensional copula. Then the random times T1, ..., Tm follow a copula model 
with marginal hazard-rate processes (y;,i),i = 1,...,m, and conditional survival 
copula C, if there is an m-dimensional random vector U ~ C, independent of Fo, 
such that 

ti = inf{t > 0 : exp(—Iri) < Ui}, L<i<m. (9.63) 


Note that Definition 9.41 provides an obvious way to simulate a copula model of 
default, provided we know how to simulate the copula C. To simulate a realization of 
T1,.--, Tm We generate a realization of the hazard-rate processes (4,1), ..-, (Yt,m) 
and, independently, a realization of the random vector U; the t; are then constructed 
according to (9.63). 

The crucial part in setting up a copula model is the choice of the threshold cop- 
ula C. Useful copulas and the resulting copula models will be discussed in the next 
subsection; for the moment we merely recall that we obtain conditionally indepen- 
dent default times if and only if we take C to be the independence copula. 

We now collect some elementary consequences of Definition 9.41. Since E; := 
— Indi — U;) ~ Exp(1), Lemma 9.12, together with (9.63), immediately yields that 
each of the qt; is a doubly stochastic random time with (F¥;)-conditional hazard-rate 
process (y,;). Hence M;i := Yri — T(t A ti) is a martingale with respect to the 
filtration {gi} := (Fi) V (Hi }. Unless C is the independence copula, it is, however, 
not true that M; ; is also a martingale with respect to the filtration (ġr), i.e. given 
default information for other obligors as well (see Section 9.8.1 for details). 

At time t = 0 the marginal distribution of the t; can be computed as in the 
single-firm case. We have, using iterated conditional expectations, 


T 
PQ, <T)=E(P(j ST | Fo) =1- (exn(- f rids) ). (9.64) 
0 
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In particular, at time t = O it is possible to calibrate the model to a given term 
structure of credit or CDS spreads by calibrating each of the marginal hazard-rate 
processes (ys i) using methods for the single-firm case. This is an important feature 
of the model in practical applications. Note, however, that for t > 0 the conditional 
distribution of t; given the default history of all obligors in the portfolio generally 
differs from the conditional distribution of t; given that t; > t, so fort > O the 
single-firm and the portfolio versions of the model differ. We discuss this point in 
more detail in Section 9.8.1 below. 

Next we show that the threshold copula C is in fact the survival copula of the q; 
conditional on Foo. By definition we have 


t 
Fifa (t) := P(t > t | Foo) = exp ( -f Ys,i as). 
0 


Moreover, according to (9.63), t; > t if and only if U; < Fai Fo (t). Hence we 
obtain, using the independence of U and Foo, 


P(T >ti,- , Tm > tm | Fo) SP Cre Falf (ti), <- -, Um < Pees Ga) 
= Ciesla Pais (tm)). (9.65) 


By Sklar’s identity for survival functions (see (5.13)), C is thus the conditional 
survival copula of the t; given Foo. 


Models with deterministic hazard rates. From now on we concentrate on models 
where the marginal hazard rate y;(s) is deterministic. Since dependence between 
the t; can be introduced via the threshold copula C, this gives rise to interesting mod- 
els. In fact, the literature on copula models focuses almost exclusively on models 
with deterministic marginal hazard rates. Moreover, understanding the properties 
of models with deterministic hazard rates is an important step in the analysis of 
more general models with stochastic hazard rates. These models are usually stud- 
ied first under the artificial filtration (G,), with $; = Foo V H, t > 0, for which 
hazard rates are deterministic; pricing formulas with respect to the smaller filtration 
Qt = Fi V Hi, t > O, are then derived using the theorem of iterated conditional 
expectations. 

With deterministic marginal hazard rates y;(t), the default times t; are indepen- 
dent of the background filtration (¥;), and we may restrict our attention to the 
filtration (#,), which is generated by the default indicators. Moreover, in that case 
the conditional survival function with respect to Foo and the unconditional survival 
functions obviously coincide: we have F; (t) = Fy F(t) = exp(—T; (t)). Hence 
relation (9.65) yields 


F; (ti, .--, tm) = CCFi(ti), ---, Fin tn) (9.66) 
t t 
=c(e(- f noas) ep( = f mas) ); 
0 0 
(9.67) 


Relation (9.66) shows that with deterministic marginal hazard rates the conditional 
survival copula C is the survival copula of the default times; (9.67) shows how this 
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copula and the marginal hazard rates determine the joint survival function of the 
default times. We will use both relations frequently below. From a mathematical 
point of view it makes no difference whether we specify the copula and marginal 
hazard rates or the joint survival function F directly, because every joint survival 
function F with absolutely continuous marginal distributions has a unique represen- 
tation of the form (9.67) (put y(t) = —(0/dr) In F;(t) and define C using Sklar’s 
identity for survival functions). When deriving mathematical results we will there- 
fore work directly with F. The representation (9.67) is, however, convenient for the 
calibration of the model, as will be discussed below. 

Finally, a word of warning is in order: for models with stochastic hazard rates, 
the unconditional survival copula of (t1,..., Tm) is different from the conditional 
survival copula given Fə; for example, in models with conditionally independent 
defaults but dependent hazard-rate processes the conditional survival copula given 
Fo is the independence copula, but (T1, ..., Tm) is obviously not a vector of inde- 
pendent rvs. 


Static version. It is interesting to link copula models to the static threshold models 
considered in Section 8.3. Fix some horizon T > 0. Obviously, Y7,; = 1 if and 
only t; < T, so Yr follows a threshold model in the sense of Definition 8.4 with 
critical variables X = (T1,..., Tm) and default threshold T. By (9.66) the survival 
copula of (T1,..., Tm) equals C; if C is radially symmetric (see Definition 5.13), 
C is also the copula of (t1, ..., Tm). This is true, in particular, if C is an elliptical 
copula such as the Gauss copula or the t copula. The findings of Section 8.3.5 on 
the implications of the choice of C for the portfolio loss distribution thus carry 
over to dynamic models. In particular, if C is an elliptical copula, increasing the 
degree of dependence in the tail of C or choosing higher asset correlations leads to 
a heavier-tailed distribution for Mr = $}; Yri. 


On calibration. The calibration of a copula model with deterministic marginal 
hazard rates for pricing purposes proceeds in two steps. Marginal risk-neutral hazard 
rates are calibrated to a given term structure of credit spreads from defaultable bonds 
or CDS spreads, as described in Section 9.3.3. If there is a liquid market for portfolio- 
related credit derivatives, the parameters of the threshold copula C can be calibrated 
to the observed prices of these products. While this is a straightforward concept, the 
technical details of this procedure can be quite involved (see Notes and Comments 
for references). 

Otherwise one typically calibrates the copula to estimates of default correlation 
over the maturity of the products to be priced; such estimates are either obtained 
using asset correlations in conjunction with the multivariate version of the Merton 
model introduced in Section 8.2.4, or via one of the statistical procedures described 
in Section 8.6. Note that in this approach it is implicitly assumed that risk-neutral 
and historical default correlations are equal, which is a strong assumption. When we 
calibrate a copula model under the real-world probability measure, hazard rates are 
calibrated to estimates of historical default probabilities; parameters of the copula 
are again calibrated to estimates of historical default correlations. 
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9.7.2 Factor Copula Models 


In this section we consider models where the threshold vector U has a conditional 
independence structure in the sense of Definition 8.18, i.e. where there is a p- 
dimensional random vector V, p < m, such that, conditional on V, the Uj are 
independent. These models are sometimes called factor copula models. Under our 
assumption of deterministic marginal hazard rates y;(t), we get from (9.63) and the 
conditional independence of the U; given V that 


P(t,- tm) = E(P (U1 < Fut)... Um < Fin(tm) | V)) 
= e(T] PU; < Fi) | v). (9.68) 
i=l 


Denote by Fav (t | v) the conditional survival function of t; given V = v and note 
that, by construction, Fry (t | v) = P(U; < Fi(t) | V = v). Hence 


m 
Fito) = E(T] Fy v(t | v). (9.69) 
i=1 
Denoting by Gy the df of V and by gy the density (if it exists), we will sometimes 
write (9.69) more explicitly as 


Fett = fh [ [fava | Davan = f [| | Fav G | gv) do. 
P isl P isl 


Note that the representation (9.69) is analogous to the representation of static 
one-period threshold models with conditional independence structure as Bernoulli 
mixture models, obtained in Section 8.4.4. In particular, (9.69) shows that for T 
fixed the default indicators follow a Bernoulli mixture model with factor vector V 
and conditional default probabilities Q;(v) = 1 — Fay (t | v). Following standard 
terminology from survival analysis the unobservable vector V is sometimes termed 
the frailty of the default times. As in the case of static models, the mixture-model 
representation of a factor copula model is very useful. It leads to a natural interpre- 
tation of default contagion in terms of incomplete information (see Section 9.8.2 
below for details). Moreover, it can be used for simulation purposes; we sketch the 
algorithm below. 


Algorithm 9.42 (simulation of factor copula models). 


(1) Generate a realization of V. 


(2) Generate independent rvs t; with df 1 — Friv (t | V), 1 < i < m. In order to 
generate a sequence (Tn, En), Tn < T, of default times up to some maturity 
date, one might use recursive generation of default times (Algorithm 9.38). 


The importance-sampling techniques discussed in Section 8.5 in the context of 
static Bernoulli mixture models can be employed to improve the performance of 
Algorithm 9.42. These techniques are particularly useful if one deals with rare-event 
simulation, such as in the pricing of CDO tranches with high attachment points. 
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At first sight, the mathematical structure of factor copula models looks very simi- 
lar to the structure of models with conditionally independent defaults; in particular, 
the static versions of both model classes are Bernoulli mixture models. However, 
the model classes differ with respect to the way information is revealed to investors 
over time, which leads to completely different dynamic behaviour. In the models 
with conditionally independent defaults it is assumed that the economic factor pro- 
cess (W,) is (F;)-adapted, i.e. that its current value is known to investors at time t. 
Hence a default event does not convey additional information for predicting the 
default of other obligors. 

In the factor copula models, on the other hand, the threshold U and the frailty V 
are assumed to be unobservable. Since default probabilities depend on V, default 
information such as the news that a particular obligor j has defaulted at a given 
point in time ¢ does convey additional information about the distribution of V. The 
survival probabilities of the remaining obligors i # j change, as they are computed 
as the average of the conditional survival function F, iv (t | v) with respect to the 
conditional distribution of V given the default history #,. The updating of the 
distribution of the unobservable random vector V can lead to default contagion, as 
will be discussed in more detail in Section 9.8.2 below. This comparison between 
factor copula models and models with conditionally independent defaults shows 
that dynamic models do possess a much richer structure than static models. 

Below we consider specific examples of factor copula models. Obviously, every 
continuous multivariate distribution with p-dimensional conditional independence 
structure can be used to construct a factor copula model. Practically important exam- 
ples include the Gauss copula C Po the t copula C ‘ P (provided that the correlation 
matrix P corresponds to a factor model, as explained in Section 3.4.1), and the LT- 
Archimedean copulas discussed in Section 5.4.2. We consider certain special cases; 
in particular, we derive the dynamic version of the two most important mixture mod- 
els from Section 8.4, namely the probit-normal mixture model and CreditRisk+. 


Example 9.43 (one-factor Gauss copula). Factor copula models based on a Gauss 
copula C pi are frequently employed in practice. The static version of these models 
corresponds to the popular CreditMetrics/KMV-type models discussed in Exam- 
ple 8.6. Here we compute the conditional survival functions for the one-factor case. 
Let X; = /piV + V1 — pisi, where p; € (0, 1) and V, (€)1<i<m are iid standard 
normal rvs, so X ~ Nm (0, P), with (7, j)th element of P given by pij = ./pj pj. Set 
U; = (Xi), i.e. U ~ C pa The conditional survival function is easy to compute. 
With di (t) := 7! (F; (t)), we have that 


Fav (t | v) = PU: < AO | V =v) = (« < Ae | V= v); 


vI= pi 
leading to Fav (t | v) = D((dj(t) — Piw) / (JST — pi)). Hence 


di (ti) — fpiv -2/2 
F(t,..-,tm) = v= LU? (“ae Aah Je dv, (9.70) 
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which is easily computed using one-dimensional numerical integration. The con- 
ditional survival functions for general mean-variance mixture copulas with factor 
structure such as the t copula can be derived by an analogous computation. 

In applications of a one-factor Gauss copula model to the pricing of portfolio 
credit derivatives it is frequently assumed that p; = p for all i. In that case the 
dependence structure of the model is governed by the single parameter p, the copula 
of (t1,..-, Tm) is exchangeable and p = corr(X;, Xj), so p is readily interpreted 
in terms of asset correlation. This feature makes the exchangeable version of the 
one-factor Gauss copula popular with practitioners. In fact, it is common practice 
on CDO markets to quote prices for tranches of synthetic CDOs in terms of implied 
asset correlation, i.e. to quote the value of p which, if plugged into an exchangeable 
one-factor Gauss copula model with marginal survival probabilities calibrated to the 
CDS spreads of the asset pool, yields the price of the tranche. References regarding 
the technical details of this procedure can be found in Notes and Comments. In this 
way, prices of CDO tranches can be made comparable across attachment points and 
asset pools, in much the same way that implied volatilities are used as a common 
yardstick on options markets. This is clearly convenient. Nonetheless, one should 
bear in mind that the dependence structure of the default times in a portfolio is a 
complex object which cannot, in general, be characterized by a single number (see, 
for example, Duffie (2004) for a discussion of this point). 


Example 9.44 (LT-Archimedean copulas). Recall from Definition 5.47 in Chap- 
ter 5 that an LT-Archimedean copula is defined in terms of a positive rv V with df 
Gy, Laplace—Stieltjes transform Gy and Gy (0) = 0 using the relation 


C(ut,...,Um) = E(exp( - vX 6y'u))). 
i=l 


As usual, denote by F; (ti) the marginal survival function of t;. We thus get the 
following joint survival function of (T1, ... , Tm): 


E(t, -op tm) = e( Tei- vâr Re); (9.71) 


i=l 


which is obviously of the general form (9.69). Recall that in the special case of 
the Clayton copula with parameter 0 we have V ~ Ga(1/0, 1); explicit formulas 
for Gy and Gy for that case are given in Algorithm 5.48 for the simulation of LT- 
Archimedean copulas. Note that LT-Archimedean copulas are in general not radially 
symmetric, so the static version of a dynamic LT-Archimedean threshold copula 
model with survival function (9.71) is not a threshold model with Archimedean 
copula as discussed in Example 8.9. Nonetheless, default correlations in a dynamic 
LT-Archimedean threshold copula model are easily computed using (9.58) and the 
relation 


PQ >T, 1j > T) = Ĝy (Ô; (F (T)) + Ô (F; OM), iF i. 
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Example 9.45 (LT-Archimedean copulas with p-factor structure). As explained 
in detail in Section 5.4.2, an LT-Archimedean copula with p-factor structure is con- 
structed from a p-dimensional random vector V = (Vj,..., Vp)’ with independent 
strictly positive components and a matrix A € R”*? with elements ajj > 0 as 
follows: 


m 
C(ul,..., Um) = e( TJoa vâr a), (9.72) 
i=l 

where a; is the ith row of A and G;! is the Laplace-Stieltjes transform of the 
strictly positive rv a; V. The joint survival function F (t1, ..., tm) of the q; is then 
obtained from (9.72) by replacing u; with F; (ti). Expression (9.72) is fairly easy to 
evaluate if the Laplace-Stieltjes transform of the V; is available in closed form (see 
Section 5.4.3 for details). LT-Archimedean copulas with p-factor structure are useful 
factor copulas, since they allow for a more flexible dependence structure between the 
ti than the exchangeable standard LT-Archimedean copulas while retaining many 
of the computational advantages of the latter class. 

If the V; follow a gamma distribution with mean one, the static version of a 
generalized LT-Archimedean factor copula model is the popular CreditRisk+ model 
discussed in Section 8.4.2. In fact, for T fixed, the default indicators Yr ;,1 <i < m, 
are conditionally independent given V with default probability 


pi(V) = 1 — Fav (T | V) = 1 — exp(—a/ V Ĝ7 ' (F; (T))). 


This corresponds to the structure of CreditRisk+ as given in (8.39) with 


p 
Gij Miagi E 
wij = =p — and ki = (> ai); ‘CF (T)). 
j=10ij J= 
Notes and Comments 


The first copula model for portfolio credit risk was given by Li (2001); his model 
is based on the Gauss copula. General copula models were introduced for the first 
time in Schönbucher and Schubert (2001). Factor copula models for portfolio credit 
risk have been studied by Laurent and Gregory (2003). Models where the threshold 
copula is given by an LT-Archimedean copula with p-factor structure have been 
developed by Rogge and Schönbucher (2003). 

We have not said much about the important topic of pricing portfolio-related 
credit derivatives such as CDOs. This is mainly because, as this book goes to press, 
the market for these products and the methodology for pricing them is in a state of 
rapid development. Hence any summary of the current status of this topic is likely 
to become outdated quickly. Given the practical relevance of the subject, we try to 
compensate somewhat for this omission by briefly discussing the available litera- 
ture. For asset-based CDOs pricing is typically done using Monte Carlo simulation. 
Semianalytic approaches for the pricing of synthetic CDOs in factor copula models 
have been developed by Laurent and Gregory (2003), Hull and White (2004) and 
Andersen and Sidenius (2004), among others. Laurent and Gregory exploit the con- 
ditional independence structure of factor copula models and develop methods based 
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on Fourier analysis; Andersen and Sidenius and Hull and White propose recursive 
methods. 

The recent introduction of a quoted market for standardized synthetic CDO 
tranches has made the calibration of copula models to observable market prices 
an issue of high priority amongst financial engineers working in credit markets. 
In this context a deficiency of the exchangeable Gauss copula model has become 
apparent: the value of the implied asset correlation p needed to explain observable 
market quotes varies with the attachment points of the tranches; in particular, p is 
quite high for senior mezzanine and senior tranches. This phenomenon, which is 
frequently called base-correlation skew, bears some similarities to the well-known 
smile and skew patterns of implied volatility on options markets (see, for exam- 
ple, Dumas, Fleming and Whaley 1998). Base-correlation skews on CDO markets 
are discussed by McGinty et al. (2004), Andersen and Sidenius (2004) and oth- 
ers; the latter paper develops several extensions of the standard one-factor Gauss 
copula model that can be used to explain the base-correlation skew. A comparative 
analysis of copula-based CDO pricing models is done in Burtschell, Gregory and 
Laurent (2005). As mentioned previously, the methodology for pricing CDOs and 
related products is developing rapidly. A good place to monitor new developments 
is www.defaultrisk.com/. 


9.8 Default Contagion in Reduced-Form Models 


In this section we discuss default contagion in reduced-form models. We begin with 
a detailed analysis of default contagion in general models for dependent defaults; 
information-based default contagion in factor copula models is discussed in Sec- 
tion 9.8.2. In Section 9.8.3 we briefly look at models with interacting intensities, 
where default contagion and counterparty risk are modelled explicitly. 


9.8.1 Default Contagion and Default Dependence 


Martingale intensities. We start with a general result which characterizes the mar- 
tingale default intensities of dependent default times. As we have seen before, when 
discussing the martingale property of stochastic processes we have to be precise 
about the information available to investors, or, in mathematical terms, the filtra- 
tion we use. Here we assume that investors only have access to the default history 
of firms in the portfolio under consideration, i.e. we are interested in martingale 
properties with respect to the internal filtration (#;) introduced in (9.50). Note that 
H, can be described as H, = o ({(Tn, En) : Ta <S t}), as the sequence (Tp, En), 
Tn <S t gives an alternative description of the default history up to time t. By #7, 
we denote the o-algebra of events observable up to and including the nth default 
time T,, ie. Hr, = o ({(T}j, Ej): 1 < j < n}). (This coincides with the general 
abstract definition of the o -algebra of events observable up to some stopping time.) 


Theorem 9.46. Consider default times |, ..., Tm and denote by (#;) the cor- 
responding internal filtration. Suppose that for every 0 < n < m — 1 and every 


i € {1,..., m} there is a random mapping g” : Q x R+ — R+, measurable with 
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respect to the product o -algebra Hr, ® B(R+), such that 
AY 
PT, <s, Ens =i lNO =f Pode, 1 <i <m. (9.73) 
0 


Then the martingale default intensity of (Y;,;) with respect to (#,) is given by 


eC, = Th) 


Ati (@) = P(Tn41 >t | Hr,)(@)’ 


Derg Tii (9.74) 


The proof of this result is beyond the scope of this text. In Notes and Comments 
we reference several texts in which a proof of Theorem 9.46 and extensions to copula 
models with stochastic marginal hazard rates can be found. 


Comments. The measurability requirement on the random function go” simply 
means that the functional form of gl”? (æ, -) depends only on the default history r, . 
We will see below that (9.73) is always satisfied if the vector (T1,..., Tm) admits a 
joint density. 

The form (9.74) for the martingale intensity is in fact quite natural. If investors 
observe only past and present defaults, they obtain significant new information only 
at the time points Tı (œ), ..., Tm (œ). Hence we expect the martingale default inten- 
sity (A;,;) of some firm į € A, (a surviving firm) to evolve in a deterministic fashion 
fort € (Ta, T,+1] and to change with the random arrival of new information at 7,41. 
Moreover, it is possible to derive a different expression for (A; ;) which resembles 
more closely the common notion of an intensity as “the conditional probability of 
default in the next instant”. Applying the fundamental theorem of calculus and (9.73) 
we get, for t € [T,, T,41) and arbitrary n < m, 


(n) g 1 t—Ta+h (n) 
gi (@,t-Th) = lim i , gi (œ, u) du 
t—In 


a ol f 
= lim = Png € (t,t +h], ny1 = i | Hr). 
Hence we get, for a surviving firm i € A,n, using (9.74), 
1 
Ati = lim rC <t+hA| {tj >t forall j € An}, Hr,). (9.75) 


Now note that at time t € [T,, T,+1) default information consists of fr, and the 
atom B := {tj > t forall j € An}. If we denote by Fajs (T) := P(t > T | Hp) 
the conditional survival probability of firm 7 given the default history up to time t, 
we thus get, on {t; > t}, 


1 ð > 
àri = lim =P (t; St +h | H) =- Faga (T). 9.76 
t,i hoon ( i St+ | 1) aT ja 7 | Fe; ( ) ( ) 
Hence 4; gives the instantaneous conditional probability that a surviving firm 
i € A, defaults at time t given the default history of all firms in the portfolio up to 
time ¢. Below we explain how the conditional survival F;,|3¢, can be expressed in 
terms of derivatives of the unconditional survival function of (T1,..., Tn). 
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Remark 9.47 (martingale intensities and marginal hazard rates). Consider 
random times T1, ..., Tm following a threshold copula model with deterministic 
marginal hazard rates yı (t), ..., Ym (t) and a survival copula C admitting a density. 
In this case the assumptions from Theorem 9.46 are satisfied. However, for t > 0 the 
martingale intensity 4, ,; is in general different from the marginal hazard rate y; (t). 
As explained earlier, this shows that Y; ; — ag yi (s) ds is an (Hi )-martingale but 
not a martingale with respect to the full default information (#,). To see that in 


general, fort > 0, àr i A y(t), recall from Section 9.2.1 that 
hl) 
y(t) = or qe <t+h|t>t). (9.77) 


Hence y;(t) gives the instantaneous default probability of firm i given t; > t, 
whereas A, ; gives the instantaneous default probability given t; > t and the default 
history of all other firms in the portfolio. With default dependence the two conditional 
expectations will typically differ; a numeric illustration is given in Example 9.50 
below. With (conditionally) independent defaults on the other hand, the additional 
information about the default history of firms j 4 i in the portfolio is of no use 
in predicting the default time of firm 7, and we have às; = y;(t), as was shown 
formally in Proposition 9.39. 


Conditional survival functions. Let T, < t < T,+1 for some 0 <n < m — 1. We 
want to compute the conditional survival function Fela, for some firm i € A,. To 
simplify the notation we assume from now on that the indices have been permuted 
in such a way that AS = {1,...,n} and A, = {n+ 1,...,m}, ie. the defaulted 
firms correspond to the first n firms in our index set. Put t1 = (t1,..., Tn)” and 
T = (T41,---, Tm)’. AS an intermediate step we consider Folt (ti, ...,tm-n | T1), 
the conditional survival function of the lastm — n firms given the vector of the default 
times of the first n firms. We have the following lemma. 


Lemma 9.48. Assume that the vector (T1, . . . , Tm) has a density. Then 
n - 
——— F(t, .--, a eran) 
= atı- Ot 
Folt (ti, -<-> tm=n | T1, <--> Tn) = m 5 
— F(t, ...,T%,0,...,0 
at} me Itn ( 1 n ) 
Proof. Recall that the joint density of (t1,..., Tm) is given by 
ə” F 
CD” ————. 
Ot} +++ Otm 


Hence the result follows from the conditional density formula (3.2) for the condi- 
tional density of t2 given Tı; we omit the details. 


Finally, we turn to the conditional survival function Fyi x,- At time t default 
information consists of the vector tı of the default times of the firms from Af and 
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of the atom B := {t2 > t}. Hence we have, fori € {n+ 1,...,m} and T Èt, 
using the definition of elementary conditional expectation and Lemma 9.48, that 


P(tj >T, t >t |t) 
Pm >t |t) 


Fajs (T) = P(ti >T | B, t1) = 


n 


F(t, ..., Tnt,- T7., b) 


aoa z LE ; (9.78) 
cece OI cee edena oel 
ati = Itn ( 1 n ) 
Combining (9.76) and (9.78) we can characterize martingale default intensities in 
terms of the unconditional survival function of (t1,..., Tm). 
Corollary 9.49. Suppose that the random vector (t1, ..., Tm) admits a density. Let 
Ta < t S Ty41,0 < n < m, and suppose that AF = {1,...,n}. Then the martingale 
default intensity of firmi € An with respect to (#;) equals 
antl 7 
— F(t, ...,Tn,t,...t) 
a, = — Ot On di 
ih oo an 5 
——F(tq,...,%,t,...,¢ 
at san Itn ( 1 n ) 


If we have a closed-form expression for the unconditional survival function F (or 
equivalently for the survival copula C) of (t1,..., Tm), then it is straightforward, 
in principle, to compute the martingale default intensities. However, Corollary 9.49 
conveys little economic intuition, as it expresses martingale intensities and hence 
default contagion in terms of purely mathematical objects, namely higher-order 
derivatives of the unconditional survival function. Moreover, it seems difficult to 
use the corollary in order to build a model where default contagion follows a partic- 
ular pattern. In Section 9.8.2 we will therefore study conditional survival functions 
in factor copula models, where default contagion permits a natural economic inter- 
pretation in terms of incomplete information. 


Applications to credit-risky securities. It is interesting to investigate the implica- 
tions of our general results for martingale intensities for the pricing of credit-risky 
securities. Following the literature we use the martingale-modelling approach and 
assume that under the risk-neutral measure Q used for pricing the default times, 
T1,---, Tm follow a copula model with deterministic hazard rates yı (t), ..., Y(t) 
and survival copula C. Moreover, we assume that the risk-free short rate r(t) > 0 
is deterministic; B(t) = exp( i r(s) ds) denotes the default-free savings account. 
The assumption of deterministic interest rates is routinely made in the literature 
on pricing portfolio credit derivatives, essentially because the impact of stochas- 
tic interest rates on prices is low compared with the impact of the assumptions on 
default dependence. 

We begin with the problem of pricing a first-to-default swap. We consider a similar 
contract as in Section 9.6.3. Premiums are due at times 0 < t < --- < ty =T, 
provided that no default has yet occurred; if Ti < T and, moreover, £1 = i, there is 
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a default payment equal to the constant /;. In this set-up the value at time t = 0 of 
the default payment leg equals 


m 
ydf = XO GEL (B hn) le<r)). 


i=l 


If we condition on 7;, we get, for a single term of this sum, 
1 j 1 
ca "nn lect) = / BOTTO =T |4 =D file) dt, 
0 
where fj (t) is the marginal density of t;. Now Lemma 9.48 yields 


Q(ti =T | u =t) = Q(t; > t forall j Ai |t =t) =— 


and we obtain 
yif — — if BO — Gast) dt 


If F or, equivalently, the threshold copula C is known in closed form, this is straight- 
forward to compute by one-dimensional (numerical) integration. Note that, by defini- 
tion, Q(T, > t) = F (ty, ..-, tn); hence the value int = 0 of the premium payments 
(assuming a generic swap spread x) is given by 
N 
vem ary BG it) Ë (tns -s tn). 
n=1 

Next we consider the relationship between the instantaneous credit spread and 
martingale intensities. Denote by p1,i(t, T) the price of a zero-coupon bond with 
maturity T issued by firm i and assume that the recovery rate of this bond is equal 
to zero. Hence the price of the bond at time t < q; is given by 


T 
Piit, T) = exp(- f r(s) as) oc >T | H), (9.79) 
t 


so that the credit spread is given by c; (t, T) = —1/(T — t)ln Q (ti > T | H). 
Since Q(t; > t | #;) = 1 on {t < ti}, by (9.76), the instantaneous credit spread 
ci (t) = limy_,; ci (t, T) is given by 


ci(t, D=- In Q(t; S E Ot, >T | Hi) = àri, 
oT T=1 oT T=t 

i.e. the martingale default intensity of t; under the equivalent martingale measure Q 
is equal to the instantaneous credit spread of a zero-recovery bond issued by firm i. 

Finally, we discuss issues related to dynamic consistency in the use of copula 
models. In Section 9.7.1 we showed that at t = 0 pricing formulas for single-name 
products remain valid in a portfolio model; we also mentioned that this is no longer 
true att > 0. Here we take a closer look at this fact. Consider a corporate zero-coupon 
bond with zero recovery. According to (9.79), the price of this security at £ > 0 is 
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given by the discounted conditional survival probability of the firm given the default 
history of all obligors up to time t. In a single-firm model, on the other hand, we have 
Pi i(t, T) = Igor exp(— ie r(s)ds)Q(t > T |ti > t), ie. the price of the bond 
equals the discounted conditional survival probability of firm 7 given only its own 
default history. As discussed previously, these two conditional survival probabilities 
generally differ for t > 0. 

Note, however, that, while correct from a theoretical point of view, (9.79) does not 
correspond to the way practitioners tend to use a copula model. In deriving (9.79) 
we fixed in tọ = 0 a model for the joint distribution of (T1, ..., Tm), a model with 
constant hazard rates and a Clayton survival copula with parameter 60, say; at time 
tı > 0 we priced the bond using the conditional distribution of this model given 
the default history H, . Practitioners typically proceed in a different way: at time tı 
they calibrate a new model—in our case again a Clayton copula model but with 
parameter 01, which may be different from 6)9—\1o the market information available 
in t; and use the new model to price the bond. In general, both approaches lead to 
different distributions for the default times of surviving firms and hence to different 
prices. Clearly, the second approach is inconsistent over time; however, it leads to 
prices which are consistent with the available market information at any given point 
in time—a property that practitioners regard as highly important. 


9.8.2 Information-Based Default Contagion 


Default contagion in factor copula models can be attributed to the fact that informa- 
tion about the default history alters the conditional distribution of the unobservable 
factor vector V. In this section we make this statement precise and compute the 
conditional distribution of V given the default history up to and including time t. 
Moreover, we explain how the martingale default intensity at time t can be com- 
puted as expectations with respect to the conditional distribution of V. We assume 
throughout that the conditional distribution of t; given V admits the density 


ian 
fuwl V) = — 3, Fulve | V). 


To simplify the exposition, we further assume that V admits the density gy (v). By 
gvz, (v) we denote the conditional density of V given Hz. 


Computation of gy\,(v). We begin with the case t < Tı. Using the definition of 
elementary conditional expectation, we obtain for A C R? that 


1 Le 
PVeEA|T>HN= a! Friv (t | v)gy(v) dv. 
F(t,...,t) A jai 


Hence, for t < 7), the conditional density of V given #; is given by 


Mi- Fv |v) 
F(t,...,t) 


gvz (v) = ev(v), t<T. (9.80) 
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Now we turn to the case t € [7), T2). As an intermediary step we determine the 
conditional density gy|z j (v | tj). The conditional density formula (3.2) gives 
friv (tj | vgv(v) favj | v) 
gvi | t)) = = gyo), (9.81) 
Jre fiC; | v)gv œ) dv Fi) 

where fj(t) is the unconditional density of tj. To keep the notation simple, we 
assume, as in the previous subsection, that £; = 1. For t € [T,, T2), default infor- 
mation consists therefore of the default time tı and of the atom B := {tj > t, 2 < 
j < m}. Now we get, for A C R?, 


PV € A}NB]|u) 


P(V € A |B, t) = 


P(B | t1) 

f Tie Fyiv@ | ») 
al PŒ |7) 8Vin v | T1) dv 

and m 

P(B | t1) a (TI F,,\v(t | v )evi t | t1) dv. 

p j=2 

Hence 

i Fy (t | ) T 
evi o) = Hise Fol fav) a) peer). 032 


P(B| tı) fit) 

For t > T, the conditional density gy|#,(v) can be determined analogously; we 
omit the details. For models with Clayton threshold copula, explicit expressions for 
gvz, (v) can be given (see Example 9.50 below). 


Martingale default intensities. In factor copula models we can give an intu- 
itive explanation for the dynamics of martingale default intensities. Suppose for 
the moment that the factor vector V is observable, so that the information avail- 
able to investors is given by the artificial filtration H, = Hıvo(V), t 20. 
Since the t; are conditionally independent given V, by Proposition 9.39 the 
martingale intensity of t; with respect to the large filtration (H) is given by 
Ait | V) = fuv (t | V)/Friv(t | V). Now, it is well known that the martingale 
intensity of t; with respect to the internal filtration (f;) can be computed by pro- 
jection, i.e. àt i = EQi(t | V) | 7) (see, for example, Theorem 14 in Chapter 2 of 
Brémaud (1981)). Hence we get 


kna ee ees (9.83) 


RP Fy,\v(t | v) 
Example 9.50 (Clayton copula model). For models with a Clayton thresh- 
old copula the conditional density gy, (v) and the martingale default intensi- 
ties (A;,;) can be computed explicitly. Recall that the Clayton copula with param- 
eter 0 > 0 is an LT-Archimedean copula model, as introduced in Example 9.44 
with V ~ Ga(1/0, 1). Fix 0 > 0 and denote by G and G~! the Laplace Stieltjes 
transform of the Ga(1/@, 1) distribution and its functional inverse. Recall that for 
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arbitrary a, 8 > 0 the density g(v; a, p) of the Ga(a, 8) distribution satisfies 
g(v; a, B) x v’—! exp(—Bv), where “œ” denotes “is proportional to”. As shown 
in Example 9.44, in LT-Archimedean threshold copula models the conditional sur- 
vival function F, iv (t | v) equals exp(—vG~! (F; (t))). Hence the density of t; given 
V = vis given by 


—fi(t) 
G'(G~!(Fi(t))) 


ð - = _ 
fave |v) = — 3 Fav C |) = vexp(—vG"! (Fi(t))). 


We obtain, for t < Tı, using (9.80), 


m 
gvise,(v) x v!/®! exp - of +) ôR), t<T. (9.84) 
i=l 
Hence, for t < Tı, the conditional distribution of V given H, is again a gamma 
distribution but now with parameters a = 1/0 and 6B =1+ XL 1 Ĝ-L(F;(t)). 
Recall that the mean of a Ga(«, 8) distributed rv equals w/f. Hence the conditional 
mean of V given Tı > t is lower than the unconditional mean of V. This is in line 
with economic intuition, since the fact that T; > t is “good news” for the portfolio. 
According to (9.81), the density of V given qı satisfies 


vir (v | T1) x vexp{(—vG7!(Fi(t1))}g(v; 1/0, 1) 
=v! expf—v(1 + G(Fi(t1)))}, (9.85) 


so, given T1, V is gamma distributed with parameters a = 1 + 1/0 and 
B=1+ G7!(F\(1)). It is instructive to look at the impact of tı on the mean of 
the conditional distribution of V, given by (1/0 + 1)/ Go! (Fi (t1)). Suppose that t1 
occurs unusually early, i.e. that Fi (t1) is close to one. This implies that Go (F 1(T1)) 
is close to zero, and the mean of the conditional distribution is bigger than the 
unconditional mean 1/0. Since the conditional survival functions Fav (t | v) are 
decreasing in v, the conditional survival probabilities of the remaining obligors are 
thus decreased. A similar qualitative reasoning applies if tı occurs late in the sense 
that F 1 (T1) is close to zero; obviously, in that case conditional survival probabilities 
are increased. 

Next we turn to the case t € [T1, T2). For notational simplicity again we assume 
that €; = 1. Using (9.82), it is easily seen that, given #;, V follows a gamma distri- 
bution with parameters œ = 1 + 1/6 and B=14+G7!(11) +", G7'(Fj(0). 
Note that, in 7), the conditional mean uyg, of V jumps upwards: we have 


r 1/8 
Im UVH, = z = , 
5T, Lear Cy) 
1+1/6 
UVH = 


1+, GF TD) 

Finally, we compute the martingale intensity A;; using (9.83). We obtain 
~ —fi)V —fi HEV | H, 
OV ee cp: ee 

G'(G~'(Fi(t))) G'(G"!(Fi(0))) 
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Figure 9.12. Paths of the default intensity (àz) in the Clayton copula model, assuming that 
the first default time T) equals four months. The parameters are as follows: portfolio size 
m = 100; marginal default intensity y = 0.02; one-year default correlation 2% (alternatively, 
0.5%). As one would expect, a higher default correlation implies a stronger contagion effect. 


The martingale default intensity is thus proportional to E(V | #,), the conditional 
mean of V given #;. In particular, A;,; jumps upward at each successive default 
time T, and decreases gradually between defaults. This is illustrated in Figure 9.12. 


9.8.3 Interacting Intensities 


In copula models the dependence structure of the default times is exogenously 
specified; the form of the resulting default contagion can then be computed from the 
model primitives. In models with interacting intensities, on the other hand, the impact 
of defaults on the default intensities of surviving firms is exogenously specified; the 
joint distribution of the default times is then endogenously derived. This leads to 
a very intuitive parametrization of counterparty risk and default dependence. The 
main drawback of models with interacting intensities is the fact that the marginal 
distribution of individual default times is typically not available in closed form, so 
the calibration of the model to defaultable term structure data is more evolved than 
in copula models. 

In models with interacting intensities the martingale default intensity of firm i 
belonging to a given portfolio is given by an exogenously specified function A; (t, Y;) 
of time and the current state Y, of the portfolio. The dependence on the current state 
of the portfolio is the major innovation of the model; in this way, counterparty 
risk can be modelled explicitly. Suppose, for instance, that firm i is a commercial 
bank and that firm j is a major borrower from bank i, so that we expect the con- 
ditional default probability of firm 7 to increase given that firm j defaults. This 
can be modelled by taking A; (tf, y) = ajo(t) + aii O ly; =1;(y) for non-negative 
and bounded functions ajo, aj, : [(0,0©o) > R+. It is straightforward to extend the 
model to stochastic default intensities of the form å; (W, Y;) for some observable 
background process (W) (references are given in Notes and Comments). 

It is convenient to model the default indicator process (Y;) in a model with inter- 
acting intensities as a time-inhomogeneous continuous-time Markov chain. In this 
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way the computational tools from the theory of Markov chains can be used for the 
analysis and simulation of the model. Below we summarize a few essential facts 
about continuous-time Markov chains; several textbooks on stochastic processes 
containing a detailed discussion of continuous-time Markov chains are listed in 
Notes and Comments. 


Continuous-time Markov chains. A time-inhomogeneous continuous-time Mar- 
kov chain (X;) ona finite state space S is characterized by non-negative and bounded 
transition rate functions X(t,x, y), x,y € S,x Æ y,t > 0, with the following 
interpretation. Fix t > 0 and let T := inf{s >t: Xs Æ X;}, i.e. T gives the time 
of the first jump of the chain after time t. Define, for x € S, 


kera DS) AGxy), £20, 
yeS, y#x 


and denote by H, := o ({Xs : s < t}) the internal filtration of the chain. Then 


Ss 
PT > 8 | Hi) = POT > 5 1X) =ex( f Ha Xv Xu ), s>t. 
t 


(9.86) 
In the special case of a time-homogeneous Markov chain where the transition rate 
functions are independent of time, given #6, the rv (T — t) (the waiting time for the 
next jump after time t) is thus Exp(—A(X;, X;)) distributed. Moreover, we have, 
for y € S and T as before, 


P(Xr = y | H, T) = —A(T, X, y)/A(T, Xr, X+); (9.87) 


given that the chain has a jump at time f, the probability of jumping to a particular 
state y is thus proportional to the transition rate A(t, X;_, y), where X;_ denotes the 
state of the chain immediately before the jump. Next we introduce the generator Gir, 
t > 0, of the chain (X;). For fixed t the operator Gp] associates with every function 
f : S — Ra new function Gy f : S > R with 


Ginf®)= YS Atx DO) — SE). (9.88) 


yes, y#x 


The generator is a very useful mathematical object for a number of reasons. First, it 
is a well-known result that, for any f : S > R, the process 


t 


MÍ := sæ- f Gis] f(Xs)ds, t>0, (9.89) 
0 


is an (#f;)-martingale. Second, as explained below, the generator appears in the 
Kolmogorov equations, a system of ODEs characterizing the transition probabilities 
of the chain (X+). 


Construction of interacting intensities via Markov chains. Now we turn to the 
formal construction of models with interacting intensities. Set S := {0, 1}” and 
define, for y € Sandi € {1,...,m}, the state y' by y; = yj forj € {1,...,m}\i 
and y; = 1 — yj, i.e. y' is constructed from y by flipping the ith coordinate. Given, 
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for 1 < i < m, non-negative and bounded functions à; : [0, 00) x S —> R+ (the 
candidate martingale default intensities), we define the default indicator process 
(Y,) as a time-inhomogeneous continuous-time Markov chain with state space S 
and transition rates 


yı=0}Ai (t, y), ifx = yi for some į € {1..., m}, 


K 
A(t, y, xX) = (9.90) 


0, otherwise. 
Relation (9.90) implies that the chain can jump only to those neighbouring states yi 
that differ from the current state Y, by exactly one default; in particular, there are 
no joint defaults. If Y, ; = 0, the probability that firm i defaults in the small time 
interval [f, t+h), i.e. the probability of jumping to the neighbouring state yi in [ź, t+ 
h), is approximately equal to hà; (t, Y,). The generator of (Y;) is given by 


m 
Gaf o) = > Ky =o Ait, YF Oy) — fy), ye. (9.91) 
i=1 
The definition of (Y;) suggests that (å; (t, Y;)) is the martingale default intensity of 
firm i. Using (9.89), a formal proof is easy. Let fi(y) = yi, so Ysi = fi(Y,) and 
Gin fi(y) = Ity,=-0)Ai (t, y). Hence 


TAT t 
ie I WETE f Gi fi.) ds 
0 0 
is a martingale by (9.89). 


Transition functions and Kolmogorov equations. The transition probabilities of 
the chain (Y;) are given by 


ptt, s,x,y):= P(Y; = y | Y, = x), x,yeS,0<t<s<oo. (9.92) 


It is well known that the function p(t, s, x, y) satisfies the Kolmogorov backward 
and forward equations. These equations are very useful numerical tools in the analy- 
sis of the model. The backward equation is a system of ODEs for the function 
(t,x) — p(t,s,x,y),0 < t < s; s and y are considered as parameters. The 
general form of the equation is (0/dt)p(t,s,x, y) + Giyp(t, s, x, y) = 0, with 
terminal condition p(s, s, x, y) = Jty}(x). In our model this leads to the following 
system of ODEs: 


Op(t, s, x, y) 


apt LL DAE DPE 8, 2", y) = Plt s, xX, 9) =0. 


i=1 

The forward equation is an ODE system for the function (s, y) > p(t,s,x, y), 
s > t, which is governed by the adjoint operator Gin of the generator Gyr. The 
derivation of the precise form of the equation is slightly more involved and we refer 
to Frey and Backhaus (2004) for details. 

For m small, the ODE systems corresponding to the Kolmogorov backward (and 
forward) equation are easily solved numerically. Note, however, that the cardinality 
of the state space equals #S = 2”, so for m large the Kolmogorov equations are no 
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longer useful and one has to resort to simulation. A model with interacting intensities 
is easily simulated using a variant of Algorithm 9.38 (recursive default time simula- 
tion) (see, for example, Appendix C of Lando (2004) for details). Alternatively, one 
may reduce the size of the state space by considering a model with a homogeneous 
group structure, as will be explained below. 


Models for the default intensities. The functions A; (t, y) are an essential ingredient 
in any model with interacting intensities. We therefore discuss several specifications 
proposed in the literature. Jarrow and Yu (2001) study a model with stochastic 
background process (WY), but restrict themselves to a special form of interacting 
intensities called the primary—secondary framework. In this framework firms are 
divided into two classes: primary and secondary. The default intensity of primary 
firms depends only on (¥,); the default intensity of secondary firms depends on (¥Y%) 
and on the default state of the primary firms. This simplifying assumption facilitates 
the mathematical analysis of the model. Below we present a specific example from 
their paper. We let m = 2 and identify (WY) with the short rate of interest (7;). The 
default intensities are then given by 


Av, Yi) = aio + aire and À2(rr, Yi) = a20 + aziri + az2 liy, ,=1), 


so company one is a primary firm and company two is a secondary firm. A typical 
scenario for the primary—secondary framework is as follows: primary firms corre- 
spond to large corporations; secondary firms correspond to commercial banks which 
have a major credit exposure to the primary firms. Note that, under the primary— 
secondary framework, cyclical default dependence, such as a situation where the 
default intensity of firm i is affected by the default of firm j +Æ i, and vice versa, 
cannot be modelled. 

Yu (2005b) analyses a model where the whole portfolio enters an “enhanced risk 
state” after the first default. Default intensities of the form 


Ai(t, Yr) = ao + ai liy, #0, i € {l,...,m}, ao,a) > 0 (9.93) 


are used. Hence, at the first default time Tı, the default intensities of the surviv- 
ing firms jump from aọ to ap + a). The assumption of identical default intensities 
for all firms implies that the portfolio is homogeneous, i.e. that the default times 
(T1,.-.-, Tm) are exchangeable. Yu suggests that for a portfolio of high-quality cred- 
its a reasonable order of magnitude for the model parameters is a9 ~ 1% and 
a, © 0.1%. Simulation studies reported in his paper indicate that the model might 
be able to explain certain features of credit spreads in the market for European 
telecom bonds. 

Frey and Backhaus (2004) study a model where the default intensity of a given 
firm depends on the overall proportion of companies that have defaulted so far. The 
homogeneous-portfolio version of the model can be described as follows. Denote the 
proportion of defaulted companies in state y by m(y) := 1/m )7y"_, yi, for y € S. 
Then 

ài (t, Yi) = A(t, m(X;)) (9.94) 
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for some bounded function h : R+} x [0, 1] — R+ that is increasing in its second 
argument. This type of interaction between default intensities makes immediate 
sense: to begin with, if a financial institution has incurred an unusually high number 
of losses in its loan portfolio, it is less likely to extend credit lines if another obligor 
experiences financial distress. Obviously, this raises the default probability of the 
remaining obligors. Moreover, an unusually small number of defaults might have 
a negative impact on the overall business climate. From a mathematical point of 
view, if we assume that the default times (t1, ..., Tm) are exchangeable, the default 
intensities are necessarily of the form (9.94). For instance, the default intensities in 
the homogeneous model (9.93) are of the form (9.94) with h(t, 1) = ap + a1 I1130}. 


Exchangeable models. We conclude with a few results for the exchangeable 
case (9.94). For default intensities of the form (9.94), the process (M,) with 
M, = m(Y,) is itself a Markov chain with state space S = {0, 1/m,..., 1}. In fact, 
at time ¢ the process M, can only jump to the state M, + 1/m, which happens with 
intensity X740 — Y; (t))à; (t, Y) = m(1 — M,)h(t, M,). This shows that (M,) is 
itself a Markov chain with generator 


GM FO =m(1—)ht, D(f + 1/m) — f(D). (9.95) 


Note that the state space S of (M,) is of size m + 1, whereas the state space of (Y;) is 
of size 2” . Hence, under the interaction (9.94), the distribution of Mp can be inferred 
using analytical tools such as the Kolmogorov equations, even for m relatively large. 
In the exchangeable model (9.94), many quantities of interest can be easily computed 
from the distribution of Mr. For instance, we obtain for the default probability 7r 
of some firm i from our portfolio x = 1/mE(Mr), and similar expressions can be 
obtained for the higher-order default probabilities 2, introduced in Section 8.3.1. 

Finally, we present some numerical results from Frey and Backhaus (2004) that 
illustrate the impact of interacting intensities on default correlations and quantiles 
of Mr. We consider a model with a stochastic background process given by a 
one-dimensional CIR square-root diffusion, as in Section 9.5.2, with parameters 
K = 0.03, ô = 0.005, o = 0.016 and initial value W = 0. These values have been 
taken from the empirical study by Driessen (2005). The default intensities are given 
by i 

h(t, Y, m) = [æ (0.004 + 5.7074) + a; (m — (1 — e™™))]*. 

The interpretation of this model is as follows. The number 1 — e~*” measures the 
expected proportion of defaulted firms at time t. For aj > 0 the default intensity 
of non-defaulted companies is increased (decreased) if the proportion of defaulted 
companies is higher (lower) than the expected proportion 1 — e~*” and we have 
interaction between default events. For aj = 0, on the other hand, we are in a 
standard model with conditionally independent defaults, as studied in Section 9.6.3. 
We take the horizon to be T = 1 year. In the simulations, the parameter a,, which 
controls the strength of the interaction, is increased from 0 to 3; the parameter a 
is adjusted in order to ensure that the one-year default probabilities P(Y1,; = 1) 
remain unchanged as we vary a. Simulation results are presented in Table 9.3 for 
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Table 9.3. Default correlation and quantiles of M4 in a homogeneous model with 
interacting intensities for m = 500 firms and varying interaction a]. 


aq PY; =1) py 40.9 90.95 0.975 0.99 


0 0.031 99 0.00041579 0.044 0.046 0.05 0.054 
1 0.031 98 0.005 075 3 0.052 0.058 0.066 0.072 
3 0.031 99 0.058 283 0.096 0.128 0.156 0.19 


the case m = 500. We see that the default correlation o(Y1,;, Y1,;),i # j, and 
the quantiles of the distribution of M increase substantially as a; increases. This 
makes perfect sense: for aj > 0 a higher (lower) than usual number of defaults in 
the portfolio leads to an increase (decrease) in the default intensity of the remaining 
firms in the portfolio and thus to a further increase (decrease) in the ratio of realized 
versus expected defaults, so the resulting distribution of Mı will have more mass in 
the tails. 


Notes and Comments 


Theorem 9.46 is taken from Brémaud but is originally due to Jacod (1975). Both texts 
are excellent references that study point processes from the viewpoint of stochastic 
calculus. Dynamic properties of copula models were first studied in Schonbucher and 
Schubert (2001). The pricing of first-to-default swaps follows Laurent and Gregory 
(2003). 

The techniques used in Section 9.8.2 are popular in survival analysis (see, for 
example, Chapter 10 of Andersen et al. 1993); in the context of portfolio credit 
risk, related ideas can be found in Schonbucher (2004). Collin-Dufresne, Goldstein 
and Helwege (2003) propose a model for information-based default contagion that 
starts from the mixture representation (9.69) of the survival function. Giesecke and 
Goldberg (2004) study information-based default contagion in a structural multi- 
firm model with incomplete information about the default thresholds. 

Our presentation of models with interacting intensities is based on Frey and Back- 
haus (2004). The first model with interacting intensities is due to Jarrow and Yu 
(2001). Davis and Lo (2001) pointed out the link between models with interact- 
ing intensities and finite-state Markov chains. Mathematical aspects of the Jarrow— 
Yu model are discussed in Kusuoka (1999), Bielecki and Rutkowski (2002) and 
Collin-Dufresne, Goldstein and Hugonnier (2004). Yu (2005b) provides an alter- 
native construction of the Jarrow—Yu model using the general hazard construction 
from survival analysis. Moreover, certain features of the model are studied using 
simulation. The pricing of portfolio credit derivatives in models with interacting 
intensities is discussed in Frey and Backhaus (2004); this paper also contains an 
analysis of the asymptotic behaviour of the homogeneous model (9.94) for large 
portfolios. Credit risk models with explicitly specified interaction between default 
intensities are conceptually and mathematically close to models for interacting par- 
ticle systems developed in statistical physics. Follmer (1994) contains an inspiring 
discussion of the relevance of ideas from the interacting-particle-systems literature 
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for financial modelling; the link to credit risk is explored by Giesecke and Weber 
(2004, 2005), Horst (2004) and Focardi and Fabozzi (2004). Egloff, Leippold and 
Vanini (2004) study credit contagion in a firm-value model. Allen and Gale (2000) 
discuss financial contagion from a financial economics viewpoint; an interesting 
analysis of systemic risk in financial networks in general can be found in Eisenberg 
and Noe (2001). 

Many textbooks on stochastic processes contain an introduction to continuous- 
time Markov chains. Excellent texts are Resnick (1992), Davis (1993) and Norris 
(1997); a good summary is given in Appendix C of Lando (2004). Continuous-time 
Markov chains are frequently used to build dynamic models for rating-transitions 
(see, for example, Jarrow, Lando and Turnbull (1997) or Chapter 6 of Lando (2004)). 


10 


Operational Risk and Insurance Analytics 


We have so far concentrated on the modelling of market and credit risk, which reflects 
the historical development of quantitative risk management in the banking context. 
Some of the techniques we have discussed are also relevant in operational risk 
modelling, in particular the techniques of extreme value theory (EVT) in Chapter 7 
and the aggregation methodology of Chapter 6. But we also need other techniques 
tailored specifically to operational risk, and we believe that actuarial models used 
in non-life insurance are particularly relevant. 

In the first half of this chapter (Section 10.1) we examine the Basel II require- 
ments for the quantitative modelling of operational risk, discussing various potential 
approaches. On the basis of some industry data we highlight the possibilities and 
limitations of existing tools for the calculation of an operational risk-capital charge. 

In Section 10.2 we summarize the techniques from actuarial modelling that are 
relevant to operational risk, under the heading of insurance analytics. Our discussion 
in that section, though motivated by quantitative modelling of operational risk, has 
a much wider applicability in quantitative risk management. For example, some 
techniques have implicitly been used in the credit risk chapters. The Notes and 
Comments section at the end of the chapter gives an overview of further techniques 
from insurance mathematics that we feel will become useful in the years to come. 


10.1 Operational Risk in Perspective 
10.1.1 A New Risk Class 


In our overview of Basel IJ, in Section 1.3.1, we introduced operational risk as 
a new risk class for which financial institutions, bound by the Basel Committee 
rules (Basel II) and to some extent also by Solvency 2 (Section 1.3.2), are required 
to put aside regulatory capital. We first recall the Basel II definition as it appears in 
the final document (Basel Committee on Banking Supervision 2004). 


Operational risk is defined as the risk of loss resulting from inade- 
quate or failed internal processes, people and systems or from external 
events. This definition includes legal risk, but excludes strategic and 
reputational risk. 


Examples of losses falling within this category are, for instance, fraud (internal 
as well as external), losses due to IT failures, errors in settlements of transactions, 
litigation and losses due to external events like flooding, fire, earthquake or terrorism. 
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Losses due to unfortunate management decisions, such as many of the mergers and 
acquisitions of the 1990s or the launch of larger-scale bank-assurance projects, are 
definitely not included. 

A case that touched upon almost all aspects of the above definition was that of 
Barings (see also Section 1.2.2). From insufficient internal checks and balances 
(processes), to fraud (human risk), to external events (the Kobe earthquake), many 
operational risk factors contributed to the downfall of this once proud merchant bank. 
Further examples include the $691 million rogue trading loss at Allfirst Financial, 
the $484 million settlement due to misleading sales practices at Household Finance, 
and the estimated $140 million loss for the Bank of New York stemming from the 
September 11 attacks. All examples offer a clear proof of the fundamental importance 
of operational risk as a risk class to be monitored. Current estimates for capital 
allocated to operational risk at large international banks are in the range $2—7 billion 
(see deFontnouvelle et al. 2003). 

An essential difference between operational risk, on the one hand, and market 
and credit risk, on the other, is that operational risk has no upside for a bank. It 
comes about through the malfunctioning of parts of daily business and hence is as 
much a question of quality control as anything else. Clearly, banks try as hard as 
possible to avoid operational risk but, despite their best efforts, operational losses 
will continue to occur. 

This has prompted the Basel Committee to decide that banks must set aside risk 
capital under Pillar I of the three-pillar system (see Section 1.3.1). The Pillar II and 
Pillar III proposals of the new accord imply that a supervisory review process for 
operational risk must also be put in place and that an appropriate market discipline 
with respect to public disclosure must be adhered to. The market has not been slow 
to provide various ways of mitigating the effects of the new risk category, ranging 
from IT solutions and data warehouses to improve the measurement of operational 
risk, to insurance-type solutions for banks willing and able to enter into such deals. 

Currently, and for the foreseeable future, the lack of operational loss data is a major 
issue, and this is similar to the problem faced by underwriters of catastrophe insur- 
ance. The insurance industry’s answer to the problem has involved data-pooling 
across industry participants and a similar discussion is now taking place in the 
banking industry. Once representative data sources become available, the imple- 
mentation of many of the methods discussed in this book (such as EVT in Chapter 7 
and the insurance analytics of Section 10.2) will become increasingly feasible. Exist- 
ing sources of data at present are the databases produced by the Quantitative Impact 
Studies (QISs) of the Basel Committee and by the Federal Reserve Bank of Boston. 
Moreover, some private companies are also providing data. 

In Section 10.1.3 we discuss the kind of advanced-measurement (AM) approach 
that an analysis of operational loss data allows. Before this we discuss so-called 
elementary approaches to operational risk modelling. In these approaches, aimed 
at smaller banks without extensive international activities, the detailed modelling 
of loss distributions for different risk classes and risk types is not required; a fairly 
simple volume-based capital charge is proposed. We note that, as in the case of credit 
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risk, the approaches proposed by Basel II for the calculation of regulatory capital 
represent a gradation in complexity. Recall that, for credit risk, banks must imple- 
ment either the standardized approach or the internal-ratings-based (IRB) approach, 
as discussed in Section 1.3.1 and Section 8.1. 


10.1.2 The Elementary Approaches 


There are two elementary approaches to operational risk measurement. Under the 
basic-indicator (BI) approach, banks must hold capital for operational risk equal to 
the average over the previous three years of a fixed percentage (denoted by a) of 
positive annual gross income (GJ). Figures for any year in which annual gross income 
is negative or zero should be excluded from both the numerator and denominator 
when calculating the average. Hence the risk capital under the BI approach for 
operational risk in year ¢ is given by 


3 
1 
RCB (OR) = a > œ max(GI'~, 0), (10.1) 
t i=l 


where Z; = sa Tigris, and GI'~' stands for gross income in year t — i. 
Note that an operational risk-capital charge is calculated on a yearly basis. The 
BI approach gives a fairly straightforward, volume-based, one-size-fits-all capital 
charge. Based on the various QISs, the Basel Committee suggests that a = 15%. 

Under the standardized (S) approach, banks’ activities are divided into eight busi- 
ness lines: corporate finance; trading & sales; retail banking; commercial banking; 
payment & settlement; agency services; asset management; and retail brokerage. 
Precise definitions of these business lines are to be found in the Basel Committee’s 
final document (Basel Committee on Banking Supervision 2004). Within each busi- 
ness line, gross income is a broad indicator that serves as a proxy for the scale of 
business operations and thus the likely scale of operational risk exposure. The capital 
charge for each business line is calculated by multiplying gross income by a factor 
(denoted by £) assigned to that business line. As in (10.1), the total capital charge 
is calculated as a three-year average over positive GIs, resulting in the following 
capital charge formula: 


3 8 
1 -i 
RC§(OR) = 5 X max] ) pjGI, nol (10.2) 
i=l j=l 


It is to be noted that in formula (10.2), in any given year t — i, negative capital 
charges (resulting from negative gross income) in some business line j may offset 
positive capital charges in other business lines (albeit at the discretion of the national 
supervisor). This kind of “netting” should induce banks to go from the basic indicator 
to the standardized approach; the word “netting” is of course to be used with care 
here. Based on the QISs, the Basel Committee has set the beta coefficients as in 
Table 10.1. Moscadelli (2004) gives a critical analysis of these beta factors, based 
on the full database of more than 47 000 operational losses of the second QIS of the 
summer of 2002 (see also Section 10.1.4). 
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Table 10.1. Beta factors for the standardized approach. 


Business line (j) Beta factors (8j) 
j = 1, corporate finance 18% 
j = 2, trading & sales 18% 
j = 3, retail banking 12% 
j = 4, commercial banking 15% 
j = 5, payment & settlement 18% 
j = 6, agency services 15% 
j = 7, asset management 12% 
j = 8, retail brokerage 12% 


In both approaches (BI, S) the Basel Committee expects further guidelines (mainly 
under Pillars II and II) to be adhered to. Also, at national discretion, supervisors 
may adopt slight (often more conservative) changes to aspects of the above rules, the 
latter clearly with a level playing field for the different market participants in mind. 
Widely adopted risk-management rules should be formulated as much as possible in 
such a way as to avoid regulatory arbitrage within and across national jurisdictions. 


10.1.3 Advanced Measurement Approaches 


Under an AM approach, the regulatory capital is determined by a bank’s own inter- 
nal risk-measurement system according to a number of quantitative and qualitative 
criteria set forth in documentation produced by the Basel Committee (Basel Com- 
mittee on Banking Supervision 2004). We will not go into all relevant steps of the 
procedure leading towards the acceptance of an AM approach for an internationally 
active bank and its subsidiaries; the Basel Committee’s documents give a clear and 
readable account of this. We focus instead on the methodological aspects of a full 
quantitative approach to operational risk measurement. It should be stated, how- 
ever, that, as in the case of market and credit risk, the adoption of an AM approach 
to operational risk is subject to approval and continuing quality checking by the 
national supervisor. 

While the BI and S approaches prescribe the explicit formulas (10.1) and (10.2), 
the AM approach lays down general guidelines. In the words of the Basel Committee 
(Basel Committee on Banking Supervision 2004). 


Given the continuing evolution of analytical approaches for operational 
risk, the Committee is not specifying the approach or distributional 
assumptions used to generate the operational risk measure for regula- 
tory capital purposes. However, a bank must be able to demonstrate 
that its approach captures potentially severe “tail” loss events. What- 
ever approach is used, a bank must demonstrate that its operational risk 
measure meets a soundness standard comparable to that of the internal 
ratings-based approach for credit risk (comparable to a one year holding 
period and the 99.9 percent confidence interval). 
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In an AM approach, operational losses should be categorized according to the eight 
business lines mentioned in Section 10.1.2 as well as the following seven loss-event 
types: internal fraud; external fraud; employment practices & workplace safety; 
clients, products & business practices; damage to physical assets; business disrup- 
tion & system failures; and execution, delivery & process management. Banks are 
expected to gather internal data on repetitive, high-frequency losses (three to five 
years of data), as well as relevant external data on non-repetitive low-frequency 
losses. Moreover, they must add stress scenarios both at the level of loss severity 
(parameter shocks to model parameters) and correlation between loss types. In the 
absence of detailed joint models for different loss types, risk measures for the aggre- 
gate loss should be calculated by summing across the different loss categories. In 
general, both so-called expected and unexpected losses should be taken into account 
(i.e. risk-measure estimates cannot be reduced by subtraction of an expected loss 
amount). 

We now describe a skeletal version of a typical AM solution for the calculation 
of an operational risk charge for year t. We assume that historical loss data from 
previous years have been collected in a data warehouse with the structure 


(eee patio Te b Sy 8) CHL Te RS i NO, 
(10.3) 
where ee stands for the kth loss of type £ for business line b in year t — i; 


N'DE is the number of such losses and T > 5 years, say. Note that thresholds 
may be imposed for each (i, b, £) category and small losses less than the threshold 
may be neglected; a threshold is typically of the order of €10 000. The total historical 
loss amount for business line b in year t — i is obviously 


7 Nita 
PELES yy kee (10.4) 
f=1 k=1 
and the total loss amount for year t — i is 
8 
LEEN a: (10.5) 
b=1 


The problem in the AM approach is to use the loss data to estimate the distribution 
of L, for year t and to calculate risk measures such as VaR or expected shortfall 
(see Section 2.2) for the estimated distribution. Writing Q« for the risk measure at a 
confidence level a, the regulatory capital is determined by 


RCh Įm(OR) = ea(L'), (10.6) 


where œ would typically take a value in the range 0.99-0.999 imposed by the 
local regulator. Because the joint distributional structure of the losses in (10.4) 
and (10.5) for any given year is generally unknown, we would typically resort to 
simple aggregation of risk measures across loss categories to obtain a formula of 
the form 


8 
RC\y(OR) = Y` oa (L"). (10.7) 
b=1 
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In view of our discussions in Chapter 6, the choice of an additive rule in (10.7) 
can be understood. Indeed, for any coherent risk measure og, the right-hand side 
of (10.7) yields an upper bound for the total risk @q(L‘). In the important case of 
VaR, the right-hand side of (10.7) corresponds to the comonotonic scenario (see 
Proposition 6.15). The optimization results of Section 6.2 can be used to calculate 
bounds for Qg (L*) under different dependence scenarios for the business lines; see, 
in particular, Example 6.23 and Table 6.1. 

Reduced to its most stylized form in the case when Qg = VaRy anda = 0.999, 
a capital charge under the AM approach requires the calculation of a quantity of the 


type 


N 
VaRo.999 (> Xr), (10.8) 


k=1 
where (Xx) is some sequence of loss severities and N is an rv describing the fre- 
quency with which operational losses occur. Random variables of the type (10.8) are 
one of the prime examples of the actuarial models that we treat in Section 10.2.2. 
Before we move on to those models in the next section, we highlight some “stylized 
facts” of operational loss data. 


10.1.4 Operational Loss Data 


In order to reliably estimate (10.6), (10.7) or, in a stylized version, a quantity 
like (10.8), we need extensive data. The data situation for operational risk is much 
worse than that for credit risk, and is clearly an order of magnitude worse than for 
market risk, where vast quantities of data are publicly available. Banks have only 
recently started gathering data and pooling initiatives are in their infancy, so, as far 
as we know, no reliable publicly available data source on operational risk exists. 
Our discussion below is based on some industry data we have been able to analyse 
as well as on the findings in Moscadelli (2004) for the QIS database and the results 
of the 2004 loss-data collection exercise by the Federal Reserve Bank of Boston 
(see Federal Reserve System 2005). An excellent overview of some of the data 
characteristics is to be found in the Basel Committee’s report (Basel Committee 
on Banking Supervision 2003). From the latter report we quote: 


Despite this progress, inferences based on the data should still be made 
with caution. ... In addition, the most recent data collection exercise 
provides data for only one year and, even under the best of circum- 
stances, a one-year collection window will provide an incomplete pic- 
ture of the full range of potential operational risk events, especially of 
rare but significant “tail events”. 


In Figure 10.1 we have plotted operational loss data obtained from several sources; 
parts (a)—(c) show losses for three business lines for the period 1992—2001. It is less 
important for the reader to know the exact loss type—it is sufficient to accept that 
the data are typical for (b, £) categories in (10.3). In part (d), the data from the three 
previous figures have been pooled. 
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Figure 10.1. Operational risk losses: (a) type 1, n = 162; (b) type 2, n = 80; 
(c) type 3, n = 175; and (d) pooled losses n = 417. 


Exploratory data analysis reveals the following stylized facts (confirmed in several 
other studies): 


e loss severities have a heavy-tailed distribution; 
e losses occur randomly in time; 


e loss frequency may vary substantially over time. 


The third observation is partly explained by the fact that banks have only recently 
started gathering operational risk data prompted by Basel II. There is a considerable 
amount of reporting bias resulting in fewer losses in the first half of the 1990s and 
more losses afterwards. Moreover, several classes of loss may have a considerable 
cyclical component and/or may depend on changing economic covariables. For 
instance, back-office errors may depend on volume traded and fraud may be linked 
to the overall level of the economy (depressions versus boom cycles). This clear 
inhomogeneity in the loss frequency makes an immediate application of statistical 
methodology difficult. However, it may be reasonable to at least assume that the 
(inflation-adjusted) loss sizes have a common severity distribution, which would 
allow, for instance, the application of methods from Chapter 7. 

In Figure 10.2 we have plotted the sample mean excess functions (7.16) for the 
data in Figure 10.1. This figure clearly indicates the first stylized fact of heavy- 
tailed loss severities. The mean excess plots in (a) and (b) are clearly increasing in 
an approximately linear fashion, pointing to Pareto-type behaviour. This contrasts 
with (c), where the plot appears to level off from a threshold of one. This hints at a 
loss distribution with finite upper limit, but this could only be substantiated by more 
detailed knowledge of the type of loss concerned. Pooling the data in (d) masks the 
different kinds of behaviour, and perhaps illustrates the dangers of naive statistical 
analyses that do not consider the data-generating mechanism. 

Moscadelli (2004) performed a detailed EVT analysis (including a first attempt to 
solve the frequency problem) of the full QIS data set of more than 47 000 operational 
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Figure 10.2. Corresponding sample mean excess plots for the data in Figure 10.1: 
(a) type 1; (b) type 2; (c) type 3; (d) pooled. 


losses and concluded that the loss dfs are well fitted by generalized Pareto distri- 
butions (GPDs) in the upper-tail area (see Section 7.2.2 for the necessary statistical 
background). The estimated tail parameters ( in (7.14)) for the different business 
lines range from 0.85 for asset management to 1.39 for commercial banking. Six 
of the business lines have an estimate of Ẹ greater than one, corresponding to an 
infinite mean model! Based on these QIS data, the estimated RC/GI ratios (the 6 in 
Table 10.1) range from 8.3% for retail banking to 33.3% for payment & settlement, 
with an overall alpha value (see (10.1)) of 13.3%, slightly below the Basel II value 
of 15% used in the BI approach. Note the much broader range of values of the 6 
emerging from the analysis of the QIS data compared with the prescribed range of 
12-18% for the standardized approach in Table 10.1. 

As more data become available, more conclusive analyses may be possible. It 
is clear, however, from Moscadelli (2004) that the GPD method of Section 7.2 
is one of the most useful statistical tools at our disposal and yields a fit that is 
superior to other loss distributions in the high-tail area; this has been corroborated by 
several practitioners from the banking and insurance industry. In view of the heavy- 
tailedness of the data, and the necessity of calculating capital charges corresponding 
to high quantiles, it seems very natural to use EVT methodology. 


Notes and Comments 


Several textbooks on operational risk have been published: see, for example, Cruz 
(2002, 2004), King (2001), the Risk Books publication edited by Risk Books (2003) 
and chapters in Ong (2004) and Crouhy, Galai and Mark (2001). In particular, 
Chapter 4 of Cruz (2004), written by Carolyn Currie, gives an excellent overview 
of the regulatory issues surrounding operational risk. 

A practical implementation is discussed in Ebnother et al. (2003). Frachot, 
Georges and Roncalli (2001) discuss the loss-distribution approach to operational 
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risk. Dobeli, Leippold and Vanini (2003) elaborate on the way in which a good 
operational risk framework may lead to an overall improvement in quality of the 
business operations. 

Figure 10.1 is taken from Embrechts, Kaufmann and Samorodnitsky (2004). The 
latter paper also stresses the important difference between so-called repetitive and 
non-repetitive losses. For the former (to some extent less important) losses, statis- 
tical modelling can be very useful. For non-repetitive, low-probability, high-severity 
losses, much more care has to be taken before a statistical analysis can be performed 
(see Pézier 2002a,b). 

EVT methods for operational risk quantification have been used by numerous 
authors (see, for example, Coleman 2002, 2003; Medova 2000a,b). Because of the 
non-stationarity of operational loss data over several years, more refined EVT models 
are called for. See, for example, Chavez-Demoulin and Embrechts (2004); Chavez- 
Demoulin, Embrechts and NeSlehova (2005) for some examples of such models. For 
a critical article on the use of EVT for the calculation of an operational risk-capital 
charge, see Embrechts, Furrer and Kaufmann (2003), which contains a simulation 
study of the number of data needed to come up with a reasonable estimate of a high 
quantile. The use of statistical methods other than EVT are discussed in the textbooks 
referred to above. These methods include linear predictive models, Bayesian belief 
networks and discriminant analysis. Excellent data-analytic papers using published 
operational risk losses are deFontnouvelle et al. (2003) and Moscadelli (2004). 
Finally, recall from Notes and Comments of Section 6.2 the paper by Rosenberg 
and Schuermann (2004), which addresses the aggregation of market, credit and 
operational risk measures. 


10.2 Elements of Insurance Analytics 
10.2.1 The Case for Actuarial Methodology 


Actuarial tools and techniques for the modelling, pricing and reserving of insurance 
products in the traditional fields of life, non-life and reinsurance have a long history 
going back more than a century. More recently, the border between financial and 
insurance products has become blurred, examples of this process being equity-linked 
life products and alternative risk-transfer vehicles (see Section 1.5.2 and Notes and 
Comments of that chapter). 

Whereas some of the combined bank-assurance products have not met with the 
success that was originally hoped for, it remains true that there exists an increasing 
need for financial and actuarial professionals who can close the methodological gaps 
between the two fields. In the sections that follow we discuss insurance analytical 
tools that we believe the more traditional finance-oriented risk manager ought to be 
aware of; the story behind the name insurance analytics can be found in Embrechts 
(2002). 

It is not only the occasional instance of joint product development between the 
banking and insurance worlds that prompts us to make a case for actuarial method- 
ology in QRM, but also the observation that many of the concepts and techniques 
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of QRM described in the preceding chapters are in fact borrowed from the actuarial 
literature. 


e Risk measures like expected shortfall (Definition 2.15) have been studied in 
a systematic way in the insurance literature. Expected shortfall is also the 
standard risk measure to be used under the Solvency 2 guidelines. 


Many of the dependence modelling tools presented in Chapter 5 saw their first 
applications in the realm of insurance. Moreover, notions like comonotonicity 
of risk factors have their origins in actuarial questions. 

In Section 6.1 we discussed the axiomatization of financial risk measures and 
pointed at the parallel development of insurance premium principles (often 
with very similar goals and results). 


The statistical modelling of extremal events has been a bread-and-butter sub- 
ject for actuaries since the start of insurance. Hence, many of the tools pre- 
sented in Chapter 7 are well known to actuaries. 


Within the world of credit risk management, the industry model CreditRisk+ 
(Section 8.4.2) is known as an actuarial model. 


The actuarial approach to the modelling of operational risk is apparent in the 
AM approach of Section 10.1.3. 


In the sections that follow, we give a brief discussion of relevant actuarial techniques. 
The material presented should enable the reader to transfer actuarial concepts to 
QRM in finance more easily. We do not strive for a full treatment of relevant tools 
as these could fill a separate (voluminous) textbook (see, for example, Denuit and 
Charpentier (2004), Mikosch (2004) and Partrat and Besson (2004) for excellent 
accounts of many of the relevant techniques). 


10.2.2 The Total Loss Amount 


Reconsider formula (10.8), where a random number N of random losses or severi- 
ties Xg occurring in a given time period are summed. To apply a risk measure like 
VaR we need to make assumptions about the (X) and N, which leads us to one of 
the fundamental concepts of (non-life) insurance mathematics. 


Definition 10.1 (total loss amount and distribution). Denote by N (t) the (ran- 
dom) number of losses over a fixed time period [0, t] and write X1, X2,... for the 
individual losses. The total loss amount (or aggregate loss) is defined as 


NG) 
SNe) = x Xk, (10.9) 
k=1 


with df Fsyy (x) = P(Snqa) <S x), the total (or aggregate) loss df. Whenever t is 
fixed, t = 1 say, we may drop the time index from the notation and simply write Sy 
and Fs,,. 


Remark 10.2. The definition of (10.9) as an rv is to be understood as Sy) (@) = 


OO) Xz (œ), œ € 2, and is referred to as a random (or randomly indexed) sum. 
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A prime goal of this section will be the analytical and numerical calculation 
of Fsy, which requires further assumptions about (X;) and N. 


Assumption 10.3 (independence, compound sums). We assume that the rvs (Xx) 
are lid with common df G, G(O) = 0. We further assume that the rvs N and (Xx) 
are independent; in that case we refer to (10.9) as a compound sum. The probability 
mass function of N is denoted by py(k) = P(N =k), k =0,1,2,.... The rv N 
is referred to as the compounding rv. 


Proposition 10.4 (compound distribution). Let Sy be a compound sum and sup- 
pose that Assumption 10.3 holds. Then, for all x > 0, 


[0,0] 
Fs (x) = P(Sv <x) = X pw()G™ (x), (10.10) 
k=0 
where G“) (x) = P(Sk < x), the kth convolution of G. Note that G® (x) = 1 for 
x > 0, and G® (x) = 0 for x <0. 


Proof. Suppose x > 0. Then 


(ee) 


Fy (x) = D0 P(Sy <x | N =k) P(N =k) = DO pw(GOR). 
k=0 k=0 


Although formula (10.10) is explicit, its actual calculation in specific cases is 
difficult because the convolution powers G“ of a df G are in general not avail- 
able in closed form. Hence, one resorts to (numerical) approximation methods. 
A first class of these uses the fact that the Laplace—Stieltjes transform of a convo- 
lution is the product of the Laplace-Stieltjes transforms. Using the usual notation 
F (s)= hi e °* dF (x), where s > 0 for Laplace-—Stieltjes transforms, we have 
that G(s) = (G(s))*. It follows from Proposition 10.4 that 


Psy (s) = X` p(k) G(s) = My (G(s), s > 0, (10.11) 
k=0 


where My denotes the moment-generating function of N. 
Example 10.5 (the compound Poisson df). Suppose that N has a Poisson df with 
intensity parameter à > 0, denoted N ~ Poi(A). In that case, py (k) = e~*a*/K!, 
k > 0, and, for s € R, 

k 


[0,6] 
39, 
My(s) = ye he = exp(—A(1 — s)). 
k=0 
Hence from (10.11) it follows that, for s > 0, 
Fy (s) = exp(—A(1 — G(s))). 


In this case, the df of Sy is referred to as the compound Poisson df and we write 
Sy ~ CPoi(A, G). Formula (10.11) facilitates the calculation of moments of Sy 


474 10. Operational Risk and Insurance Analytics 


and lends itself to numerical evaluation through Fourier inversion, known as the 
fast Fourier transform (FFT) (see Notes and Comments for references on the latter). 
For the calculation of moments, note that, under the assumption of the existence of 
sufficiently high moments and hence differentiability of G and My, we obtain 


k 
t My) = E(N(N — 1)---(N-—k+1))) 
ds sal 
and 
k dt. k 
(-1) que OO) bas = E(X}) = uk. 


Example 10.6 (continuation of Example 10.5). In the case of the compound Pois- 
son df, one obtains 


d a R R 
E(Sy) = (Da, Fsw (s)| = exp(—a(1 — G(0)))A(—G'(0)) 
s=0 
= Ap = E(N)E(X1). 
Similar calculations yield var(Sy) = E(S%,) — (E(Sy))? = Apo. 


For the general compound case one obtains the following useful result. 


Proposition 10.7 (moments of compound dfs). Under Assumption 10.3 and 
assuming that E(N) < œ, u2 < œ, we have that 


E(Sn)= E(N)E(Xı) and var(Sy) = var(N)(E (X1)? + E(N) var(X}). 
(10.12) 


Proof. This follows readily from (10.11), differentiating with respect to s. The 
following direct proof avoids the use of transforms. Conditioning on N and using 
Assumption 10.3, one obtains 

o 


N 
E(Sn) = E(E(Sw | N)) = e(e( yo x 
k=1 


N 
2 a( 3 EX) = E(N)E(X1) 


k=1 


N 


and, similarly, 
N 
v)) = e(e( DYP XxxX 


E(S}) = e(e((2x) dd, v)) 


= E(Nm + N(N — let) = E(N)m + (E(N*) — E(N)) 5 
= E(N) var(X1) + E(N*)(E(X1))°, 


so var(Sy) = E(Sx) — (E(Sw))? = E(N) var(X1) + var(N)(E(X1))*. 
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Remark 10.8. Formula (10.12) elegantly combines the randomness of the frequency 
(var(N)) with that of the severity (var(X)). In the compound Poisson case it reduces 
to the formula var(Sy) = AE (X i) = àun, as in Example 10.6. In the deterministic- 
sum case, when P(N = n) = 1, say, we find the well-known results E (Sy) = ny1 
and var(Sy) = n var(X1); indeed, in this degenerate case, var(N) = 0. 


The compound Poisson model is a basic model for aggregate financial or insur- 
ance risk losses. The ubiquitousness of the Poisson distribution in insurance can 
be understood as follows. Consider a time interval [0, 1] and let N denote the total 
number of losses in that interval. Suppose further that we have a number of poten- 
tial loss generators (transactions, credit positions, insurance policies, etc.) that can 
produce, with probability pn, one loss or, with probability 1 — pn, no loss in each 
small subinterval ((k — 1)/n, k/n] for k = 1,...,n. Moreover, suppose that the 
occurrence or non-occurrence of a loss in any particular subinterval is not influ- 
enced by the occurrence of losses in other intervals. Then the number N, of losses 
has a binomial df with parameters n and py, so 


n 


k 


Combined with a loss-severity distribution this frequency distribution gives rise, 
in (10.10), to the so-called binomial loss model. Next suppose that n — oo in such 
a way that limy_,o9 WP, = à > 0. It follows from Poisson’s theorem of rare events 
(see also Section 7.4.1) that 


P(N, =k) = ( ) oka = pat k=0,...,n. 


k 


X 
lim P(N, =k) =e, k=0,1,2,..., 
n—>00 k! 


i.e. Noo ~ Poi(A), explaining why the Poisson model assumption is very natural as 
a frequency distribution and the compound Poisson model is a common aggregate 
loss model. The compound Poisson model has several nice properties, one of which 


concerns aggregation and is useful in the operational risk context in situations such 
as (10.5). 


Proposition 10.9 (sums of compound Poisson rvs). Suppose that the compound 
sums Sy, ~ CPoi(A;, Gi), i = 1,...,d, and that these rvs are independent, then 
Sy = EL; Sw, ~ CPoi(A, G), where à = YL] A; and G = “(Ai /NGi. 


Proof. (For d = 2, the general case being similar.) Because of independence and 
Example 10.5 we have, for the Laplace—Stieltjes transform of Sy, 


Êsy (8) = Fsy, (8) Fy, (9) 
= exp ( —(Ay+ (1 — 


= exp(—A(1 — G(s), 


abo (AiG (s) + ia6a(0))) 


where A = A; + Az and 
M 2 
G= Gı + G2 
ài +A2 Ay +A2 
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The result follows since the Laplace-Stieltjes transform uniquely determines the 
underlying df. 


Hence the new intensity À is just the sum of the old ones, whereas the new severity 
df G is a discrete mixture of the loss dfs G; with weights 4;/A,i = 1,...,d. We 
can easily simulate losses from such a model through a two-stage procedure: first 
draw i (i = 1,..., d) with probability 4; /A, and then draw a loss with df G;. 


Beyond the Poisson model. The Poisson model can serve as a stylized represen- 
tation of the loss-generating mechanism from which more realistic models can be 
derived. For instance, we may wish to introduce a time parameter in N to cap- 
ture different occurrence patterns over time (see Section 10.2.6). Also, the intensity 
parameter à may be assumed to be random (see Example 10.20). Indeed, a fur- 
ther step is to turn 4 into a stochastic process, which gives rise to such models as 
doubly stochastic (or Cox) processes (see Section 9.2.3) or self-exciting processes, 
as encountered in Section 7.4.3. Furthermore, various forms of dependence among 
the Xx rvs or between N and (X+) could be modelled. Finally, multiline portfolios 
require multivariate models for vectors of the type (Sy,,..., Syn). An ultimate 
goal of the AM approach to operational risk would be to model such random vectors 
where where, for instance, d might stand for seven risk types, eight business lines, 
or in total 56 loss category cells. 


10.2.3 Approximations and Panjer Recursion 


As mentioned in Section 10.2.2, the analytic calculation of Fs, is not possible 
for the majority of reasonable models, which has led actuaries to come up with 
several numerical approximations. Below we review some of these approximations 
and illustrate their use for several choices of the severity df G. The basic example 
we look at is the compound Poisson case, Sy ~ CPoi(A, G), though most of the 
approximations discussed can be adjusted to deal with other distributions for N. 
Given à and G we can easily simulate Fs, and, by repeating this many times, we can 
get an empirical estimate that is close to the true df. Figure 10.3 contains a simulation 
of n = 100000 realizations of Sy ~ CPoi(100, Exp(1)). Although the histogram 
exhibits mild skewness (which can easily be shown theoretically (see (10.15))), 
a clear central limit effect takes place. This is used in the first approximation below. 


Normal approximation. As the loss rvs X; are iid (with finite second moment, 
say) and Sy is a (random) sum of the X; variables, one can apply Theorem 2.5.16 
in Embrechts, Kluppelberg and Mikosch (1997) and Proposition 10.7 to obtain the 
following approximation, for general N: 


x — E(N)E(X1) ) 
V/var(N)(E(X1))? + E(N) var(X1) 


(10.13) 


Fs, (x) xR o( 


Here, and in the approximations below, “~” has no specific mathematical interpre- 
tation beyond “there exists a limit result justifying the right-hand side to be used as 
approximation of the left-hand side”. In particular, for the compound Poisson case 
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40 60 80 100 120 140 160 


Figure 10.3. Histogram of simulated compound loss data (n = 100000) for 
Sy ~ CPoi(100, Exp(1)) together with normal approximation (10.14). 


above, (10.13) reduces to 


Foy (x) © o( >), (10.14) 


where @ is the standard normal df, as usual. It is this normal approximation that is 
superimposed on the histogram in Figure 10.3. Clearly, there are conditions under- 
lying the approximation (10.13): for example, claims should not be too heavy-tailed 
(see Theorem 10.21). 

For CPoi(A, G) it is not difficult to show that the skewness parameter satisfies 


E((Sy — E(Sw))*) EXD 


CaS O EDY 


(note that X; > 0 almost surely), so an approximation by a df with positive skewness 
may improve the approximation (10.14), especially in the tail area. This is indeed 
the case and leads to the next approximation. 


>0 (10.15) 


Translated-gamma approximation. We approximate Sy by k+ Y, where k is 
a translation parameter and Y ~ Ga(«, 6) has a gamma distribution (see Sec- 
tion A.2.4). The parameters (k, œ, 6) are found by matching the mean, the variance 
and the skewness of k + Y and Sy. It is not difficult to check that the following 
equations result: 


2 EXD 
vo [MEZD 
In our case, where A = 100 and X; has a standard exponential distribution, these 


yield the equations k + a/B = 100, œ /8? = 200 and 2/,/a = 0.2121 with solution 
a = 88.89, B = 0.67, k = —32.72. 


p4 - = AE(X)), ro = AE(X?), 


Commentary on these approximations. Both approximations work reasonably well 
in the bulk of the data. However, for risk-management purposes, we are mainly 
interested in upper tail risk; in Figure 10.4 we have therefore plotted both approx- 
imations for x > 120 on a log-log scale. This corresponds to the tail area beyond 
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— — Gamma approx. 
e Simulated 


1 — F(x) (log scale) 
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Figure 10.4. Simulated CPoi(100, Exp(1)) data together with normal- and translated- 
gamma approximations (log—log scale). The 99.9% quantile estimates are also given. 


the 90% quantile of Fs. Similar plots were routinely used in Chapter 7 on EVT 
(see, for example, Figure 7.6). It becomes clear that, as can be expected, the gamma 
approximation works better in this upper tail area where the normal approximation 
underestimates the loss potential. 

Of course, for loss data with heavier tails than exponential (lognormal or Pareto, 
say), even the translated-gamma approximation will be insufficient and other approx- 
imations can be devised based on heavier-tailed distributions, such as translated F, 
inverse gamma or generalized Pareto. 

Another approach could be based on Monte Carlo simulation of aggregate losses 
Sy to which an appropriate heavy-tailed loss distribution could then be fitted. 
One possible approach would be to model the tail of these simulated compound 
losses with the GPD using the methodology of Section 7.2.2. This is what has 
been done in Figures 10.5 and 10.6, where we have plotted various approxima- 
tions for CPoi(100, LN (1, 1)) and CPoi(100, Pa(4, 1)). The former corresponds to 
a standard industry model for operational risk (see Frachot 2004). The latter corre- 
sponds to a class of operational risk models used in Moscadelli (2004). From these 
figures the message is clear: if the data satisfy the compound Poisson assumption, 
then the GPD yields a superior fit for high quantiles. 

We now turn to an important class of approximations based on recursive methods. 
In the case where the loss sizes (X;) are discrete and the distribution function of N 
satisfies a specific condition (see Definition 10.10 below) a reliable recursive method 
can be worked out. 

Suppose that X; has a discrete distribution so that P(X, € No) = 1 with gą = 
P(X, =k), pp = P(N = k) (for notational convenience we write p for py (k)) 
and sx = P(Sy = k). For simplicity assume that gg = 0 and let a = P(X, + 
-+-+ Xn = k), the discrete convolution of the probability mass function gg. Note 


that, by definition, get» =r go gk—i- We immediately obtain the following 
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Figure 10.5. Simulated CPoi(100, LN (1, 1)) data (n = 100000) with normal-, trans- 
lated-gamma, GPD and Panjer recursion (see Example 10.17) approximations (on log-log 
scale). 
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Figure 10.6. Simulated CPoi(100, Pa(4, 1)) data (n = 100000) with normal-, 
translated-gamma, and GPD approximations (on log-log scale). 


identities: 


so = P(Syn = 0) = P(N = 0) = po, 


[0,6] 
(10.16) 
Sn = P(S =n)=}_ pg, nèl, 
k=1 


where the latter formula corresponds to Proposition 10.4 but now in the discrete 
case. As in Proposition 10.4 we note that (10.16) is difficult to calculate, mainly due 
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to the convolutions ge. However, for an important class of counting variables N, 
(10.16) can be reduced to a simple recursion. For this, we introduce the so-called 
Panjer classes. 


Definition 10.10 (Panjer class). The probability mass function (p) of N belongs 
to the Panjer(a, b) class for some a,b € R if the following relationship holds for 


r > l: pr = (a + (b/r)) pr-t. 


Example 10.11 (binomial). If N ~ B(n, p), then its probability mass function is 
Pr = Ha — p)"" for O <r < n and it can be easily checked that 


Pr P (n+ Dp 


Prat l1-—p r(l—p)’ 


showing that N belongs to the Panjer(a, b) class with a = —p/(1 — p) and 
b= (n+ 1)p/U — p). 


Example 10.12 (Poisson). If N ~ Poi(A), then its probability mass function 
pr =e *d"/r! satisfies p-/pr—-1 = 2/r, so N belongs to the Panjer(a, b) class 
with a = 0 and b =i. 


Example 10.13 (negative binomial). If N has a negative binomial distribution, 
denoted N ~ NB(«, p), then its probability mass function is 


— 1 
pa Ja- py, r20,a>0,0<p<1 
; 


(see Section A.2.7 for further details). We can easily check that 


pH (a— TEER, 


Hence N belongs to the Panjer (a, b) class with a = 1 — p and b = (a—1)(1— p). 
In Proposition 10.20 we will show that the negative binomial model follows very 
naturally from the Poisson model when one randomizes the intensity parameter of 
the latter using a gamma distribution. 


Remark 10.14. One can show that, neglecting degenerate models for (pg), the above 
three examples are the only counting distributions satisfying Definition 10.10. This 
result goes back to Johnson and Kotz (1969) and was formulated explicitly in the 
actuarial literature in Sundt and Jewell (1982). 


Theorem 10.15 (Panjer recursion). Suppose that N satisfies the Panjer(a, b) 
class condition and gy = P(X, = 0) = O, then so = po and, forr > 1, 
Sr = Vi-1@ + (bi/r))gisr—i- 


Proof. We already know that so = po from (10.16), so suppose that r > 1. 
Noting that X1, ..., Xn are iid, we require the following well-known identity for 
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exchangeable rvs: 


E(x 


os 

ll 

= 
w 

II 
Slr 
= 

es) 
ATN 

os 


7 
k= r) ae (10.17) 
= n 


Moreover, using the fact that ge D = Oforn > 2, we make the preliminary calcu- 
lation that 


Pn- $3 (e + ~ aie” = = P1). (« + ~) iets” 

i=l 
r bi n 

= Pn-1 D (a+ ~)P(x =i, a =r-i) 
i=l J=2 
r bi n 

=m} (a+ 2) P(x =i, Ne Xs =r) 
i=l j=l 
r bi n 

=p} (a+ 2) r(x =i Dl 
i=l j=l 


where (10.17) is used in the final step. Therefore, the identity (10.16) yields 


lee) [0,6] 
sr = > pie” = pigr +9 pag” 


n=1 n=2 
œ r-l 
1 
= atbp DY (a+ ™ )eipiiet” 
n=2 i=1 


r—l 
=the +L (a+? 7) me i 1) 


r-1 


bi 
= (a + b)grso + D (« + si 


i=l 
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Remark 10.16. In the case of both the FFT method and the Panjer recursion, 
an initial discretization of the loss df G generally has to be made, which intro- 
duces an approximation error. An in-depth discussion of discretization errors for the 
computation of compound distributions is to be found in Grubel and Hermesmeier 
(1999, 2000) (see also references therein for a comparison of these approaches). A 
slight correction to Theorem 10.15 has to be made if go = P(X; = 0) > 0. One 
obtains so = )¢29 pegs and, forr > 1, s, = (1 — ago)7! hy (a + bi/r) gisy—i 
(see Mikosch 2004, Theorem 3.3.10). In Notes and Comments we give further ref- 
erences. 


Example 10.17 (Panjer recursion for the CPoi(100, LN (1, 1)) case). In Fig- 
ure 10.5 we have included the Panjer approximation for the CPoi(100, LN (1, 1)) 
case. In order to apply Theorem 10.15, we first have to discretize the lognormal df. 
An equispaced discretization of about 0.5 yields the Panjer approximation in Fig- 
ure 10.5, which is excellent for quantile values around 0.999, relevant for applica- 
tions. The 99.9% quantile estimate based on the Panjer recursion is 735, a value very 
close to the GPD estimate. Far out in the tail, beyond 0.999, say, rounding errors 
become important (the tail drifts off) and one has to be more careful; in Notes and 
Comments we give some references on how to improve recursive methods far out 
in the tail. 


10.2.4 Poisson Mixtures 


Poisson mixture models have been used in both credit and operational risk modelling; 
for an example in the latter case see Cruz (2002, Section 5.2.2) as well as the book 
jacket, which features a negative binomial distribution (a particular Poisson mixture 
model). Poisson mixtures have been used by actuaries for a long time; the negative 
binomial made its first appearance in the actuarial literature as the distribution of 
the number of repeated accidents suffered by an individual in a given time span (see 
Seal 1969). 

In Example 10.5 we introduced the compound Poisson model CPoi(A, G), where 
N ~ Poi(A) counts the number of losses and G is the loss severity df. One disad- 
vantage of the Poisson frequency distribution is that var(N) = à = E(N), whereas 
count data often exhibit so-called over-dispersion, meaning that they indicate a 
model where var(N) > E(N). A standard way to achieve this is by mixing the 
intensity A over some df F', (A), i.e. assume that à > 0 is a realization of a positive 
rv A with this df so that, by definition, 


pulk) = P(N = k) = F P(N =k | A=A)dFa(QA) 
0 


k! 


Definition 10.18 (the mixed Poisson distribution). The rv N with df (10.18) is 
called a mixed Poisson rv with structure (or mixing) distribution F4. 


oo vk 
= e+ — dF,(A). (10.18) 
0 


A consequence of the next result is that mixing leads to over-dispersion. 
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Proposition 10.19. Suppose that N is mixed Poisson with structure df F4. Then 
E(N) = E(A) and var(N) = E(A) + var(A), ie. for A non-degenerate, N is 


over-dispersed. 


Proof. One immediately obtains 
99 co © ak o0 
EM = Y kpn f Yoke dFaQ) = [dd FAQ) = EM). 
k=0 0 k=0 i 0 


And, similarly, 


E(N?) = $ K py(k) = E(A) + E(A’), 
k=0 


so the result follows. 


We now give a concrete example of a mixed Poisson distribution, which is partic- 
ularly important in both operational risk and credit risk modelling. Indeed we have 
already used the following result when describing the industry credit risk model 
CreditRisk+ in Section 8.4.2. 


Proposition 10.20 (negative binomial as Poisson mixture). Suppose that the rv 
N has a mixed Poisson distribution with a gamma-distributed mixing variable A ~ 
Ga(a, B). Then N has a negative binomial distribution N ~ NB(a, B/(8 + 1)). 


Proof. Using the definition of a gamma distribution in Section A.2.4 we have 
oo pe nk 


P(N = k) = if Z eTl eTA dà = pe qetk-le-(B+DA dÀ. 
o Tæ) k KET Jo 


Substituting u = (6 + 1)A, the integral can be evaluated to be 


0 (B + 1)2+# 


a k 
pwn =» =( B y( 1 ae 
B+1) (8+1) KT) 


Using the relation T (œ + k) = (œ + k — 1) -- -æl (œ), we see that this is equal to 
the probability mass function of a negative binomial rv with p := 8/( + 1) (see 
Section A.2.7). 


This yields 


Recall the definition of compound sums from Section 10.2.2 (Assumption 10.3 
and Proposition 10.4). In the special case of mixed Poisson rvs, compounding leads 
to so-called compound mixed Poisson distributions. There is much literature on dfs 
of this type (see Notes and Comments). 
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10.2.5 Tails of Aggregate Loss Distributions 


In Section 7.1.2 we defined the class of rvs with regularly varying or power tails. If 
the (claim size) df G is regularly varying with index œ > 0, then there exists a slowly 
varying function L (Definition 7.7) such that G(x) = 1 — G(x) = x“ L(x). The 
next result shows that, for a wide class of counting dfs (py (k)), the df of the 
compound sum Sy, F’s,,, inherits the power-like behaviour of G. 


Theorem 10.21 (power-like behaviour of compound-sum distribution). Sup- 
pose that Sy is a compound sum and that there exists an € > O such that 
Yeo + tpu (k) < œ. If G(x) = x-*L(x) with a > 0 and L slowly vary- 
ing, then a 

F, 

m E ae à 
X00 G(x) 

so Fsy inherits the power-like behaviour of G. 
Proof. This result holds more generally for subexponential dfs; a proof together with 


further discussions can be found in Embrechts, Kliippelberg and Mikosch (1997, 
Section 1.3.3). 


Example 10.22 (negative binomial). It is not difficult to show that the negative 
binomial case satisfies the condition on N in Theorem 10.21. The kind of argument 
required is to be found in Embrechts, Klippelberg and Mikosch (1997, Exam- 
ple 1.3.11). Hence, if G(x) = x *L (x), the tail of the compound-sum df behaves 
like the tail of G, i.e. 


Fs, (x) ~ 580), asx > œ. 


(For details, see Embrechts, Kluppelberg and Mikosch (1997, Section 1.3.3).) 


Under the conditions of Theorem 10.21 the asymptotic behaviour of F. Sy (x) in 
the case of a Pareto loss df is again Pareto with the same index. This is clearly seen 
in Figure 10.6 in the linear behaviour of the simulated losses as well as the fitted 
GPD. In the case of Figure 10.5, one can show that F. sy (x) decays like a lognormal 
tail; see the reference given in the proof of Theorem 10.21 for details. Note that the 
GPD is able to pick up the features of the tail in both cases. 


10.2.6 The Homogeneous Poisson Process 


In the previous sections we looked at counting rvs N over a fixed time interval [0, 1], 
say. Without any additional difficulty, we could have looked at N (t) counting the 
number of events in [0, t] for t > 0. In the Poisson case this would correspond to 
N(t) ~ Poi(at); hence, for fixed ¢ and on replacing à by àt, all of the previous 
results concerning Poi(A) rvs can be suitably adapted. 

In this section we want to integrate the rvs N(t), t > 0, into a stochastic process 
framework. The less mathematically trained reader should realize that there is a big 
difference between a family of rvs indexed by time for which we only specify the 
one-dimensional dfs (which is what we have done so far) and a stochastic process 
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Figure 10.7. Sample path of a counting process. 


with a specific structure in which these rvs are embedded. This difference is akin to 
the difference between marginal and joint distributions, a topic we have highlighted 
as very important in Chapter 5 through the notion of copulas; of course, in the 
stochastic process case, there also has to be some probabilistic consistency across 
time. In a certain sense, the finite-dimensional problem of Chapter 5 becomes an 
infinite-dimensional problem. 

After these words of warning on the difference between rvs and stochastic pro- 
cesses, we now take some methodological shortcuts to arrive at our goal. The inter- 
ested reader wanting to learn more will have to delve deeper into the mathematical 
background of stochastic processes in general and counting processes in particular. 
The Notes and Comments contain some references. 


Definition 10.23 (counting processes). A stochastic process N = (N (t))r>zo is 
a counting process if its sample paths are right continuous with left limits existing, 
and there exists a sequence of rvs Tọ = 0, Ti, T, ... tending almost surely to oo 
such that N(t) = X Ki n<- 


A typical realization of such a process is given in Figure 10.7. We now define the 
homogeneous Poisson process as a special counting process. 


Definition 10.24 (homogeneous Poisson process). A stochastic process N = 
(N(t))i>0 is a homogeneous Poisson process with intensity (rate) A > 0 if the 
following properties hold: 


(i) N is a counting process; 
(ii) N(O) = 0, almost surely; 
(iii) N has stationary and independent increments; and 


(iv) for each t > 0, N(t) ~ Poi(àt). 
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Figure 10.8. Ten realizations of a homogeneous Poisson process with A = 100. 


Remark 10.25. Note that conditions (iii) and (iv) imply that, for O < u < v < t, 
the rvs N(v) — N (u) and N (t) — N (v) are independent and that, for k > 0, 
P(N(v) — NG) =k) = P(N(v — u) = k) 
= eAv-u) (v — u))k 
= =a 
The rv N(v) — N(u) counts the number of events (claims, losses) in the interval 
(u, v]; by stationarity, it has the same df as N(v — u). In Figure 10.8 we have 


generated 10 realizations of a homogeneous Poisson process on [0, 1] witha = 100. 
Note the rather narrow band within which the various sample paths fall. 


For practical purposes, the following result contains the main properties of the 
homogeneous Poisson process. 


Theorem 10.26 (characterizations of the homogeneous Poisson process). Sup- 
pose that N is a counting process. Then the following statements are equivalent: 
(1) N is a homogeneous Poisson process with rate à > 0; 


(2) N has stationary and independent increments and 


P(N(t)=1) =At+o(t), ast} 0, 
P(N(t) > 2) = o(t), ast | 0; 


(3) the inter-event times (A; = Tk — Ty-1)x>1 are iid with dfExp(A); and 


(4) forallt > 0, N(t) ~ Poi(Ar) and, given that (N (t) = k), the occurrence times 
Ti, D, ..., Tg have the same distribution as the ordered sample from k inde- 
pendent rvs, uniformly distributed on [0, t]; as a consequence, we can write 
the conditional joint density as 


Proof: Many standard textbooks on stochastic processes contain proofs of this 
important theorem (see, for example, Mikosch 2004; Resnick 1992). 
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Discussion. Statement (2) in Theorem 10.26 implies that A can indeed be inter- 
preted as a rate or intensity: A = lim,;jo(1/t)P(N(¢) = 1). Moreover, the same 
statement implies that a homogeneous Poisson process does not allow for clustering 
of events: lim;)9 P(N (t) > 2) = 0. Statement (3) gives an event-time definition of 
a homogeneous Poisson process. It follows immediately that the first event-time 
has an Exp(A) df: P(7, > t) = P(N(t) = 0) = e“", t > 0. Statement (3), how- 
ever, goes well beyond this by stating that the inter-event times A, are iid with 
Ax ~ Exp(\). This leads to a straightforward way to simulate a stream of loss 
events from a homogeneous Poisson process with rate à. Moreover, this equivalent 
definition immediately yields a generalization by assuming that the Ax are still iid 
but that A; ~ F1, a general df. The resulting process is a so-called renewal process 
(note that the only Markovian renewal process is the homogeneous Poisson process). 

Finally, statement (4) yields an easy algorithm to generate the occurrences of 
homogeneous Poisson times over the interval [0, tf] given that we have in total 
k events till t—we simply generate k uniform rvs on [0, t] and order them. 


Multivariate Poisson processes. In many applications we want to model the fre- 
quencies of different loss types with a number of Poisson processes while consid- 
ering possible dependence between loss frequencies for different loss types. More 
generally, we might want to construct a number of compound Poisson processes 
where loss severities for the different business lines were also dependent. A natural 
approach to modelling this dependence is to assume that all losses can be related 
to a series of underlying and independent Poisson shock processes. In insurance 
these shocks might be natural catastrophes; in credit risk modelling they might be 
a variety of economic events, such as local or global recessions; in operational risk 
modelling they might be the failure of various IT systems. When a shock occurs this 
may cause losses of several different types; the common shock causes the numbers 
of losses of each type to be dependent. See Lindskog and McNeil (2003), Pfeifer 
and NeSlehova (2004) and Chavez-Demoulin, Embrechts and NeSlehova (2005) for 
models of this kind. 


10.2.7 Processes Related to the Poisson Process 


Using the fundamental building block of the homogeneous Poisson process, one can 
construct more general counting processes that are useful for loss-event modelling 
in finance and insurance. Such generalizations include the following. 


Renewal processes (mentioned above). The exponential waiting time distribution 
is replaced by a general df Fa. 

Inhomogeneous Poisson processes. The constant intensity À is replaced by a deter- 
ministic function A(-). 

Mixed Poisson processes. The deterministic constant intensity is replaced by an 
tv A. 


Doubly stochastic or Cox processes. 4 is replaced by a stochastic process {A; : 
t > 0} in accordance with notation used in Chapter 9 (see, for example, Defini- 
tion 9.16). 
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Self-exciting or Hawkes processes. À is replaced by a stochastic process depend- 
ing only on previous event times. See Section 7.4.3 for a concrete example. 


Below, we highlight some features of some of these processes. 


Inhomogeneous Poisson process. 

Definition 10.27 (inhomogeneous Poisson). A counting process N is an inhomo- 
geneous Poisson process if, for some deterministic function à(s) > 0, the following 
conditions hold: 


(i) N(O) = 0, almost surely; 
(ii) N has independent increments; and 
(iii) for all t > 0, 


P(N(t+h)— N(t)=1)=åA(t)h+0o(h), hyd, 
P(N(t +h) — N(t) 2 2) = oh), h 0. 


The function A(-) is referred to as the intensity or rate function. The integral 
A(t) = h à(s) ds is referred to as the intensity measure (or cumulative intensity 
function). 


Remark 10.28. A characterization theorem, similar to Theorem 10.26, can be 
derived. In particular, we find that, forO < s < t, N(t)— N (s) ~ Poi(A(t)— A(s)). 


The inhomogeneous Poisson process is a useful tool in loss modelling whenever a 
deterministic trend or seasonality component is to be modelled in the loss frequency. 
The next example also shows that this process naturally emerges as a counting 
process for record losses. 


Example 10.29 (records). The world of finance and insurance abounds with state- 
ments on record events: the largest single day drop in the dollar/yen, the most 
expensive hurricane, the three best fund managers during the last year, the second 
largest loss due to internal fraud, the biggest one-day change in the credit spread of 
a particular company, etc. Likewise, the world of records is intimately related to the 
(general) theory of Poisson processes. In Notes and Comments we shall give several 
references for this. Below we indicate how an easy example related to a question on 
records leads to an inhomogeneous Poisson process as a model. 

Suppose that the loss rvs X; > 0 are iid with density function f(x) > 0, x > 0. 
Define the counting process N: 


CO 


i=1 
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N (t) counts the number of records in the sequence (X;)j;>1 of size less than t 
and (N (t)) is referred to as the record process. It follows that, for h, t > 0, 
[0,0] 
P(N +h) — NO) > 1)= J P(X € (t,t +A] and Xi St, X1 <0) 


i=l 


SOF +h) — FOFO! 
i=1 
F(t+h) — F(t) 

1— F(t) 
_  f@ 
~ 1- F(t) 


h+o(h), ashJ0. 
Moreover, for h, t > 0: 
P(N(t +h) — N(t) 2 2) 
< DIPS. Xin St Xe Gt +A, 


i<j Xi+ı St +h,...,Xj-1 <t +h, Xj €@,t+h)) 
tth 2 ' ses. 
= ( J soas) DOETE! 
4 i<j 


=o(h?), ash | 0. 
From these calculations one deduces that the record process N is inhomogeneous 


Poisson with rate function A(t) = f (t)/(1 — F (t)), the so-called hazard rate of F, 
a notion that we encountered in Section 9.2.1. 


Suppose now that, as in most practical cases, A(t) is strictly increasing, so 
A(A7!(t)) = A7!(A(t)) = t. We can then always transform an inhomogeneous 
Poisson process N with integrated intensity A into a homogeneous Poisson process 
with intensity 1 by a change of time. 


Proposition 10.30 (time change, operational time). Suppose that N is an in- 
homogeneous Poisson process with A strictly increasing and define, fort > 0, 
N(t) = N(A7!(t)), then N is homogeneous Poisson with intensity 1. 


Proof. For t > 0 fixed and k > 0, 


x 7 A AT! t k tk 
PRO =D = PNA) = p = To LE O er 
so Ñ (t) ~ Poi(t). By definition, the increments of Ñ are independent; moreover, 


for 0 < u < v we have that 
P(N(v) — Ñ (u) = k) = P(N(A7'(v)) — N(47! (u)) = k) 


L pA) =a! ey) MATTO) -AA w) 
k! 


ñk 
a ew) fua H) 5 
k! 


from which stationarity follows. 
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This is one of the many examples in insurance and finance where a more com- 
plicated process (N) can be reduced to a standard (easier) model (N ) through 
the careful choice of a new time clock (a so-called time change construction) 
(see also Section 9.2.3 on credit risk). Proposition 10.30 can be formulated more 
generally for A not strictly increasing and the converse also holds. Proposi- 
tion 10.30 justifies the common simplifying assumption that a loss frequency 
model is homogeneous (unit rate) Poisson, albeit in many cases only in opera- 
tional time. The original time-scale of N is slowed down or speeded up in such a 
way that, on average, N has one claim per time unit, whereas N has, on average, 
A(1) claims. 


Remark 10.31. A standard way in which an inhomogeneous Poisson process 
can be obtained from a homogeneous Poisson process is by random sampling. 
Suppose an intensity function A satisfies à(s) < c < oo for s > 0. Start from 
a homogeneous Poisson process with rate c > 0 and denote its arrival times 
by To = 0, Ti, To, .... Construct a new process N from (T;)i>o0 through dele- 
tion of each T; independently of the other T; with probability 1 — (A(Z;)/c). 
The so-called thinned counting process N consists of the remaining (undeleted) 
points. It can be shown that this process is inhomogeneous Poisson with intensity 
function A(-). 


Mixed Poisson process. The mixed Poisson rvs of Section 10.2.4 can be embedded 
into a so-called mixed Poisson process. A single realization of such a process can- 
not be distinguished through statistical means from a realization of a homogeneous 
Poisson process; indeed, to simulate a sample path, one first draws a realization 
of the random intensity à = A(q@) and then draws the sample path of the homo- 
geneous Poisson process with rate à. (Here A denotes an rv and not the intensity 
measure in the inhomogeneous Poisson case above.) Only by repeating this simu- 
lation more frequently does one see the different probabilistic nature of the mixed 
Poisson process: compare Figure 10.9 with Figure 10.8. In the former we have 
simulated 10 sample paths from a mixed Poisson process with mixing variable 
A ~ Ga(100, 1) so that E(A) = 100. Note the much greater variability in the 
paths. 


Example 10.32. When counting processes are used in credit risk modelling the times 
Tx typically correspond to credit events, for instance default or downgradings. More 
precisely, a credit event can be constructed as the first jump of a counting process N. 
The df of the time to the credit event can be easily derived by observing that P(T; > 
t) = P(N(t) = 0). This probability can be calculated in a straightforward way for 
a homogeneous Poisson process with intensity à; we obtain P(N (t) = 0) = et, 
When N is a mixed Poisson process with mixing df F4 we obtain 


P(T >t) = P(N(t) = 0) = m e™ dF,(A) = F4 (t), 
0 
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Figure 10.9. Ten realizations of a mixed Poisson process with A ~ Ga(100, 1). 


the Laplace-Stieltjes transform of F4 in t. In the special case when A ~ Ga(q, $), 
the negative binomial case treated in Proposition 10.20, one finds that 


[0,6] p* 

P(T, > n= f eS ae Pw. 
0 r (œ) 
po 


= —a a —s a—l 
=r” f e*s ds 


=p%(t+B)y"%, t20, 


so that T; has a Pareto distribution T; ~ Pa(a, 6) (see Section A.2.8). 


Processes with stochastic intensity. A further important class of models is obtained 
when à in the homogeneous Poisson case is replaced by a general stochastic pro- 
cess (A;), yielding a two-tier stochastic model or so-called doubly stochastic process. 

For example, one could take 4, to be a diffusion or, alternatively, a finite-state 
Markov chain. The latter case gives rise to a regime-switching model: in each state of 
the Markov chain the intensity has a different constant level and the process remains 
in that state for an exponential length of time, before jumping to another state. In 
Figure 10.10 we have simulated the sample path of such a process randomly switch- 
ing between A = 10 and A = 100. In Section 9.2.3 we looked at doubly stochastic 
random times, which correspond to the first jump of a doubly stochastic Poisson 
process. 


Notes and Comments 


The story behind the name insurance analytics is told in Embrechts (2002). A good 
place to start a search for actuarial literature is the website of the International Actu- 
arial Association: www.actuaries.org. Several interesting books can be found on the 
website of the Society of Actuaries, www.soa.org (whose postal address is, coinci- 
dentally, 475 North Martingale Rd #600, Schaumburg, Illinois). A standard Society 
of Actuaries textbook on actuarial mathematics is Bowers et al. (1986); financial 
economics for actuaries is to be found in Panjer et al. (1998). For our purposes excel- 
lent texts are Mikosch (2004) and Partrat and Besson (2004). Rolski et al. (1999) 
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Figure 10.10. Realization of a counting process with 
a regime switch from à = 10 toa = 100. 


gives a broad, more technical overview of the relevant stochastic process models. In 
Chapter 6 we have given several references to actuarial tools relevant for the study 
of risk measures; key words there were premium principles (Section 6.1), comono- 
tonicity (Sections 5.1.6 and 6.2.2) and Fréchet bounds (Sections 5.1.6 and 6.2). 
Finally, an overview of the state of the art of actuarial modelling is to be found in 
Teugels and Sundt (2004). 

Actuarial textbooks dealing in particular with the modelling of loss distributions in 
insurance are Hogg and Klugman (1984) and Klugman, Panjer and Willmot (1998). 
Besides the general references above, an early textbook discussion of the use of 
numerical methods for the calculation of the df of total loss amount rvs is Feilmeier 
and Bertram (1987); Buhlmann (1984) contains a first comparison between the FFT 
method and Panjer recursion. More extensive comparisons, taking rounding and dis- 
cretization errors into account, are found in Grübel and Hermesmeier (1999, 2000). 
A discussion of the use of the FFT in insurance is given in Embrechts, Griibel and 
Pitts (1993). Algorithms for the FFT are freely available on the Web, as a search will 
quickly reveal. The original paper by Panjer (1981) also contains a density version of 
Theorem 10.15. For an application of Panjer recursion to credit risk measurement 
within the CreditRisk+ framework, see Credit Suisse Financial Products (1997). 
Based on Giese (2003), Haaf, Reiss and Schoenmakers (2004) propose an alterna- 
tive recursive method. For more recent work on Panjer recursion, especially in the 
multivariate case, see, for example, Hesselager (1996) and Sundt (1999, 2000). 

Asymptotic approximation methods going beyond the normal approximation 
(10.13) are known in statistics under the names Berry—Esséen, Edgeworth and 
saddle-point. The former two are discussed, for example, in Embrechts, Kliippelberg 
and Mikosch (1997) and are of more theoretical importance. The saddle-point tech- 
nique is very useful: see Jensen (1995) for an excellent summary, and Embrechts 
et al. (1985) for an application to compound distributions. Gordy (2002) discusses 
the importance of saddle-point methods for credit risk modelling, again within the 
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context of CreditRisk+. Wider applications within risk management can be found 
in Studer (2001) and Glasserman (2003a,b) 

Poisson mixture models with insurance applications in mind are summarized in 
Grandell (1997) (see also Bening and Korolev 2002). In order to enter more deeply 
into the world of counting processes, one has to study the theory of point processes. 
Very comprehensive and readable accounts are Daley and Vere-Jones (2003) and 
Karr (1991). A study of this theory is both mathematically demanding and practically 
rewarding. Such models are being used increasingly in credit risk. The notion of 
time change is fundamental to many applications in insurance and finance; for an 
example of how it can be used to model operational risk, see Embrechts, Kaufmann 
and Samorodnitsky (2004). For its introduction into finance, see Ané and Geman 
(2000) and Dacorogna et al. (2001). An excellent survey is to be found in Peeters 
(2004). 

What have we not included in our brief account of the elements of insurance 
analytics? We have not treated ruin theory and the general stochastic process theory 
of insurance risk, credibility theory, dynamic financial analysis, also referred to as 
dynamic solvency testing, and reinsurance, to name but a few omissions. 

The stochastic process theory of insurance risk has a long tradition. The first 
fundamental summary came through the pioneering work of Cramér (1994a,b). 
Buhlmann (1970) made the field popular to several generations of actuaries. This 
early work has now been generalized in every way possible. A standard textbook on 
ruin theory is Asmussen (2000). The modelling of large claims and its consequences 
for ruin estimates can be found in Embrechts, Kluppelberg and Mikosch (1997). 

Credibility theory concerns premium calculation for non-homogeneous portfolios 
and has a very rich history rooted in non-life insurance mathematics. Its basic con- 
cepts were first developed by American actuaries in the 1920s; pioneering papers 
in this early period were Mowbray (1914) and Whitney (1918). Further important 
work is found in the papers of Bailey (1945), Robbins (1955, 1964) and Buhlmann 
(1967, 1968, 1971). An excellent review article tracing the historical development 
of the basic ideas is Norberg (1979); see also Jewell (1990) for a more recent review. 
Various textbook versions exist: Buhlmann and Gisler (2005) give an authoritative 
account of its actuarial usage and hint at applications to financial risk management. 

Dynamic financial analysis (DFA), also referred to as dynamic solvency testing 
(DST), is a systematic approach, based on large-scale computer simulations, for 
the integrated financial modelling of non-life insurance and reinsurance companies 
aimed at assessing the risks and benefits associated with strategic decisions (see 
Blum 2005; Blum and Dacorogna 2004). An easy introduction can be found in 
Kaufmann, Gadmer and Klett (2001). The interested reader can consult the website 
of the Casualty Actuarial Society (www.casact.org/research/drm). 


Appendix 


A.1 Miscellaneous Definitions and Results 
A.1.1 Type of Distribution 
Definition A.1 (equality in type). Two rvs V and W (or their distributions) are 


said to be of the same type if there exist constants a > 0 and b € R such that 
v Saw tb. 


In other words, distributions of the same type are obtained from one another by 
location and scale transformations. 


A.1.2 Generalized Inverses and Quanitiles 


Let T be an increasing function, i.e. a function satisfying y > x => T(y) > T(x), 
with strict inequality on the right-hand side for some pair y > x. Thus an increasing 
function may have flat sections; if we want to rule this out, we stipulate that T is 
strictly increasing, so y > x <=> T(y) > T(x). We first note some useful facts 
concerning what happens when increasing transformations are applied to rvs. 


Lemma A.2. 
(i) If X is an rv and T is increasing, then {X < x} C {T(X) < T(x)} and 
P(T(X) < T(x) = P(X <x) + P(T(X) = T(x), X > x). (A.1) 


(ii) If F is the df of the rv X, then P(F(X) < F(x)) = P(X <x). 


The second statement follows from (A.1) by noting that, for any x, the event given 
by {F(X) = F(x), X > x} corresponds to a flat piece of the df F and thus has zero 
probability mass. 

The generalized inverse of an increasing function T is defined to be T€ (y) = 
inf{x : T(x) > y}, where we use the convention inf Ø = oo. Strictly speaking, 
this generalized inverse is known as the left-continuous generalized inverse. The 
following basic properties may be verified quite easily. 


Proposition A.3 (properties of the generalized inverse). For T increasing, the 
following hold. 


(i) T< is an increasing, left-continuous function. 
(ii) T is continuous <= T< is strictly increasing. 
(iii) T is strictly increasing <—» T^ is continuous. 


For the remaining properties assume additionally that T © (y) < oo. 
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Figure A.1. Calculation of quantiles in tricky cases. The first case (a) is a continuous df, 
but the flat piece corresponds to an interval with zero probability mass. In the second case (b) 
there is an atom of probability mass such that, for X with df F, we have P(X = qa(F)) > 0. 


(iv) If T is right-continuous, T(x) > y => T<(y) <x. 
v) T€ oT(xX) <x. 
(vi) ToT“(y) By. 

(vii) T is strictly increasing => T~ o T(x) =x. 

(viii) T is continuous => T o T€ (y) = y. 


We apply the idea of generalized inverses to distribution functions. If F is a df, 
then the generalized inverse F^ is known as the quantile function of F. In this 
case, for a € (0, 1), we also use the alternative notation qe (F) = F€“ (æ) for the 
a-quantile of F. Figure A.1 illustrates the calculation of quantiles in two tricky 
cases. 

In general, since a df need not be strictly increasing (part (a) of the figure), we have 
F< o F(x) < x, by Proposition A.3(v). But the values x, where F© o F(x) # x, 
correspond to flat pieces and have zero probability mass. That is, we have the fol- 
lowing useful fact. 


Proposition A.4. If X is an rv with df F, then P(F©® o F(X) = X) = 1. 


A.1.3 Karamata’s Theorem 


The following result for regularly varying functions is used in Chapter 7. For more 
details see Bingham, Goldie and Teugels (1987). Essentially, the result says that the 
slowly varying function can be taken outside the integral as if it were a constant. Note 
that the symbol “~” indicates asymptotic equality here, i.e. if we write a(x) ~ b(x) 
as x —> xo, we mean liMmx—>xọ a(x)/b(x) = 1. 


Theorem A.5 (Karamata’s Theorem). Let L be a slowly varying function which 
is locally bounded in [xo, 00) for some x9 > 0. Then, 
: 1 
(a) fork > -1, f tE L(t) dt ~ ——x“t! L(x), x > o0, 
k+1 


x0 


lee) 

1 

(b) fork < -1, f t L(t) dt ~ ——— xt! L(x), x > œ. 
z k+l 
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A.2 Probability Distributions 


The gamma and beta functions appear in the definitions of a number of these distri- 
butions. The gamma function is 


lee) 
ra)= f x°le* dx, a>O0, (A.2) 
0 


and satisfies the useful recursive relationship I (œ + 1) = «T (æ). The 6 function 
is l 
_ a—l1 b—1 
(a, b) = f x^ (1— x) dx, a,b >Q. (A.3) 
0 
It is related to the gamma function by B(a, b) = T (a)r (b)/T (a + b). 
A.2.1 Beta 


The rv X has a beta distribution, written X ~ Beta(a, b), if its density is 


fœ) = x1 -—x)?"!, O<x <1, a,b >0, (A.4) 


B(a, b) 
where f(a, b) is the beta function in (A.3). The uniform distribution X ~ U (0, 1) is 
obtained as a special case whena = b = |. The mean and variance of the distribution 
are, respectively, E(X) = a/(a + B) and var(X) = (aB)/((a + B + 1)(@ + B)?). 
A.2.2 Exponential 
The rv X has an exponential distribution, written X ~ Exp(A), if its density is 
f(x) =Aexp(-Ax), x >0,A>0. (A.5) 


The mean of this distribution is E(X) = a7! and the variance is var(X) = 472. 
A.2.3 F 
The rv X has an F distribution, written X ~ F (v1, v2), if its density is 


| yy \"!/2 x(1-2)/2 
= ; > 0, v, v > 0. 
TOR GG se) CEE os 


(A.6) 
The mean of this distribution is E(X) = v2/(v2 — 2) provided that v2 > 2. Provided 
that v2 > 4, the variance is 


v2 a 


var(X) = 2( s 
v2 — 2 vı (vı — 4) 


A.2.4 Gamma 


The rv X has a gamma distribution, written X ~ Ga(a, B), if its density is 


a 


f@w= E xo! exp(—Bx), x>0,a>0, B>0, (A.7) 


where I (œ) denotes the gamma function in (A.2). Using the recursive property of 
the gamma function, the mean and variance of the gamma distribution are easily 
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calculated to be E(X) = a/f and var(X) = a/ B. For fitting a multivariate t dis- 
tribution using the EM approach of Section 3.2.4 it is also useful to know that 
E(ln X) = wW(a@) — In(B), where w(k) = dln(I (k))/dk is the digamma or psi 
function. 

An exponential distribution is obtained in the special case when a = 1. If X ~ 
Ga(a, B) and k > 0, then kX ~ Ga(a, 6/k). For two independent gamma variates 
Xı ~ Ga(q@,, 8) and X2 ~ Ga(az, 6) we have that X;+ X2 ~ Ga(a,;+a2, B). Note 
also that, if X ~ Ga(4v, $), then X has a chi-squared distribution with v degrees 


of freedom, also written X ~ x2. 


A.2.5 Generalized Inverse Gaussian 
The rv X has a generalized inverse Gaussian (GIG) distribution, written X ~ 
N- (à, x, Y), if its density is 


Àf [oafp\r 
Rie ATTN exp -ax + yx), x>0,  (A8) 


where K, denotes a modified Bessel function of the third kind with index à and the 
parameters satisfy x >0,w >0ifrA<0; x >0,W > OifA = 0; and x È 0, 
w > 0 if A > 0. For more on this Bessel function see Abramowitz and Stegun 
(1965). 

The GIG density actually contains the gamma and inverse gamma densities as spe- 


fœ) = 


cial limiting cases, corresponding to x = 0 and w = 0, respectively. In these cases 
(A.8) must be interpreted as a limit, which can be evaluated using the asymptotic rela- 
tions K} (x) ~ [7'(A)2*-!x~* as x > 0+ for A > 0 and Ky (x) ~ P(—A)27-47! xÀ 
asx — 0+ forà < 0. The fact that K; (x) = K_)(x) is also useful. In this way it can 
be verified that, for à > Oand x = 0, X ~ Ga, iy). Ifà < Oand y = 0, we have 
X ~ Ig(-A, 5 x). The case à = -4 is known as the inverse Gaussian distribution. 
Note that, in general, if Y ~ N7 (A, x, Y), then 1/Y ~ N (A, Y, x). 
For the non-limiting case when x > 0 and w > 0 it may be calculated that 


a/2 
és x Katal Vy XY) 
E(X“) = 3 R, A.9 
oo (3) Ki (V XW) a 
dE(X®) 
E(ln X) = Ja ; (A.10) 
a=0 


A.2.6 Inverse Gamma 


The rv X has an inverse gamma distribution, written X ~ Ig(a, £), if its density is 


a 
fœ) = PP e+) exp(—B/x), x>0,a>0, B>0. (A.11) 
Tr (&) 
Note that if Y ~ Ga(«, £), then 1/Y ~ Ig(a, B). Provided that a > 1, the 
mean is E(X) = B/(a — 1), and provided that a > 2 the variance is var(X) = 


B* /((a@ — 1)? (æ — 2)). Moreover, E(In X) = In(f) — Y (æ). 
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A.2.7 Negative Binomial 


The rv N has a negative binomial distribution with parameters æ > Oand0 < p < 1, 
written X ~ NB(«, p), if its probability mass function is 


atk—1 


Pv =) =( j 


Jra- k=0,1,2,..., (A.12) 


where C) for x € R and k € No denotes an extended binomial coefficient defined 
by (5) = 1 and 


, k>O. 


wee 
(; a k! 


The moments of this distribution are 
E(N)=a(1— p)/p and va(N)=@a(1-— p)/p°. 


Fora = r € N the rv N +r represents the waiting time until the rth success 
in independent Bernoulli trials with success probability p, i.e. the total number of 
trials that are required until we have r successes. For œ = 1 the rv N + 1 is said to 
have a geometric distribution. 


A.2.8 Pareto 


The rv X has a Pareto distribution, written X ~ Pa(«, k), if its df is 


Q 
rwo=1-( 5). a,k > 0, x >0. (A.13) 
K+x 


Provided that œ > n, the moments of this distribution are given by 


K"n! 


Oe Tei. 


A.2.9 Stable 


The rv X has an a-stable distribution, written X ~ St(qa, B, y, ô), if its characteristic 
function is 


exp(—y®|t|*(1 —i6 sign(t) tan(z/2)a)+idt), a £1, 
exp(—y|t|(1 +i6 sign(t)(2/z) In |t|) +167), a=l, 
(A.14) 
where a € (0,2), 6 € [-1, 1], y > 0 and ê €e R. Note that there are various 
alternative parametrizations of the stable distributions and we use a parametrization 
of Nolan (2005, Definition 1.8). The case X ~ St(a, 1, y, 0) fora < 1 gives a dis- 
tribution on the positive half-axis, which we refer to as a positive stable distribution. 
A simulation algorithm for a standardized variate Z ~ St(a, £, 1,0) is given 
in Nolan (2005, Theorem 1.19). In the case wherea #4 1, X = ô+ yZ has a 
St(a@, P, y, ô) distribution; the case a = | is more complicated. 


b(t) = E expt X) = 


A.3. Likelihood Inference 499 


A.3 Likelihood Inference 


This appendix summarizes the mechanics of performing likelihood inference, but 
omits theoretical details. A good starting reference for the theory is Casella and 
Berger (2002), which we refer to in this appendix where relevant. Other useful books 
include Serfling (1980), Lehmann (1983), Schervish (1995) and Stuart, Ord and 
Arnold (1999), all of which give details concerning the famous regularity conditions 
that are required for the asymptotic statements. 


A.3.1 Maximum Likelihood Estimators 


Suppose that the random vector X = (X1,..., Xn)’ has joint probability density (or 
mass function) in some parametric family fy (x; 0), indexed by a parameter vector 
06=(6,..., 6p) in a parameter space ©. We consider our data to be a realization 
of X for some unknown value of 0. 

The likelihood function for the parameter vector 0 given the data is L(@; X) = 
fx (X; 0) and the maximum likelihood estimator (MLE) 6 is the value of 0 max- 
imizing L(@; X), or equivalently the value maximizing the log-likelihood function 
10; X) = InL(@; X). We will also write this estimator as 6, when we want to 
emphasize its dependence on the sample size n. 

For large n we expect that the estimate 6, should be close to the true value @, 
and various well-known asymptotic results give information about the quality of 
the estimator in large samples. In describing these results we consider the classical 
situation where X is assumed to be a vector of iid components with univariate density 
f so that 


In L(0; X) = in] | f(X; 9) = Soin L(0; Xi). 


i=1 i=l 
A.3.2 Asymptotic Results: Scalar Parameter 


We consider the case when p = | and we have a single parameter 0. Under suitable 
regularity conditions (see, for example, Casella and Berger 2002, p. 516), 6, may be 
shown to be a consistent estimator of @ (i.e. tending to 0 in probability as the sample 
size n is increased). Notable among the regularity conditions are that 0 should be an 
identifiable parameter (0 £ 6 => f(x; 0) # f(x; 6)), the true parameter 0 should 
be an interior point of the parameter space ©, and that the support of f(x; 0) should 
not depend on 6. 

Under stronger regularity conditions (see again Casella and Berger 2002, p. 516), 
6, may be shown to be an asymptotically efficient estimator of 0, so it satisfies 


vn (Ên — 0) $ N (0, 1(0)7)), (A.15) 


where Z (0) denotes the Fisher information of an observation, defined by 


2 
10) = ($ In L (0; x) (A.16) 
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Under the regularity conditions, the Fisher information can generally also be calcu- 
lated as 
32 
I(0)=—E (= 


gz MLO w»). (A.17) 


Asymptotic efficiency entails both asymptotic normality and consistency. More- 
over, it implies that, in a large enough sample, var OR 1/(nI(@)), where the right- 
hand side is the so-called Cramér—Rao lower bound, which is a lower bound for the 
variance of an unbiased estimator of 0 constructed from an iid sample X1,..., Xn. 
The MLE is efficient in the sense that it attains this lowest possible bound asymp- 
totically. 


A.3.3 Asymptotic Results: Vector of Parameters 


When p > | and we have a vector of parameters to estimate, similar results apply. 
The ML estimator 0, of 0 is asymptotically efficient in the sense that, as n — oo 
and under suitable regularity conditions, 


Jab, — 0) S Np, 107), (A.18) 


where J (0) denotes the expected Fisher information matrix for a single observation, 
given, in analogy to (A.16) and (A.17), by 


a a 3? 
1(0) = E| — 1n L(0; X)— 1n L(0; X) ) = -E| — In L(0; X) ). 
(0) (= n L( ETI nL( >) (ir n L( ) 


The notation employed here should be taken to mean a matrix with components 


ð ð 3? 


The convergence result (A.18) implies that, for n sufficiently large, we have 
Ôn ~ Np(0,n—!1(6)~), (A.19) 


and this can be used to construct asymptotic confidence regions for @ or intervals 
for any component 6;. In practice, it is often easier to approximate /(@) with the 
observed Fisher information matrix 


IA P 
I0) = -—- —— ln L(0; Xi 
(6) n Z F008" (8; Xi) 
for whatever realization of X has been obtained. This should converge to the expected 
information matrix by the law of large numbers and it has been suggested that in 
some situations this may even lead to more accurate inference (Efron and Hinkley 
1978). In either case, the information matrices depend on the unknown parameters 
of the model and are usually estimated by taking 7 (0) or I (0). 


A.3. Likelihood Inference 501 


A.3.4 Wald Test and Confidence Intervals 
From (A.19) we have that, for n sufficiently large, 
6; — 6; 
Z := —.— ~ NO, 1), (A.20) 
se(0;) 


where se(0;) denotes an asymptotic standard error (estimate of the asymptotic 
standard deviation) for 6;, given by 


se(6;) = yn-"1(6);! or /n176);} 
Equation (A.20) can be used to test the null hypothesis Ho : 6; = 6;,9 for some 
value of interest 0j o against the alternative Hı : 6; Æ 0j,o. For an asymptotic test 
of size a we would reject Ho if |Z| > #7!(1 — la). 
An asymptotic 100(1 — a)% confidence interval for 0; consists of those values 
0j,0 for which the null hypothesis is not rejected and it is given by 


(6; — seĝ DT! — 5a), 6; + se(6;)®~'(1 — 5a). (A.21) 
A.3.5 Likelihood Ratio Test and Confidence Intervals 


Now consider testing the null hypothesis Hy : 6 € Op against the alternative 
M:06€ 05, where Oo C ©. We consider the likelihood ratio test statistic 


SUPpc@, LO; X) 


A(X) = 
supọco L(0; X) 


and assume, as before, that X1,..., Xn are iid and that appropriate regularity 
conditions apply. Under the null hypothesis it can be shown that, as n — ov, 
—2lnà (X) ~ x, where the degrees-of-freedom parameter v of the chi-squared 
distribution is essentially given by the number of free parameters specified by © 
minus the number of free parameters specified by the null hypothesis 8 € ©. 

For example, suppose that we partition @ so that 0’ = (6), 05), where 0; has 
dimension q and 62 has dimension p — q. We wish to test Ho : 0; = 01,9 against 
H; : 0; Æ 01,0. Writing the likelihood as L (01, 02), the likelihood ratio test statistic 
satisfies 


—2In A(X) = —2(In L(01 0, 62,0; X) — In L(61, 62; X)) ~ x2, 


asymptotically, where ôi and ô» are the unconstrained MLEs of 6, and 62, and Êz o 
is the constrained MLE of 62 under the null hypothesis. We would reject Ho if 
—2Ina(X) > cq4,1—a, where Cg,1—q is the (1 — a)-quantile of the X distribution. 

An asymptotic 100(1 — œ)% confidence set for 0; consists of the values 1 9 for 
which the null hypothesis Ho : 01 = 01,0 is not rejected, that is 


{01,0 : In L(01,0, 62,0; x) > In L(6,, 62; x) — 0.5c4,1-0}- 


In particular, if q = 1, so that we are interested only in 61, we get the confidence 
interval 
{61,0 : In L(61,0, 82,0; x) > In L(01, 02; x) — 0.5c1,1-a}. (A.22) 
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Note that such an interval will, in general, be asymmetric about the MLE 6, , in the 
sense that the distance from the MLE to the upper and lower bounds will be different. 
This is in contrast to the Wald interval in (A.21), which is rigidly symmetric. 

The curve (61,0, In L(@1,0, 62,0; x)) is sometimes known as the profile log-like- 
lihood curve for 6; and attains its maximum at ĝi. 


A.3.6 Akaike Information Criterion 


The likelihood ratio test is applicable to the comparison of nested models, i.e. situ- 
ations where one model forms a special case of a more general model when certain 
parameter values are constrained. We often encounter situations where we would like 
to compare non-nested models with possibly quite different numbers of parameters. 

Suppose we have m models M4, ..., Mm and that model j has kj parameters 
denoted by 0; = (9j1,..., O jk; y and a likelihood function L;(0;; X). In Akaike’s 
approach we choose the model minimizing 


AIC(M;) = —21n L; (Êj; X) + 2kj, 


where 6; denotes the MLE of 0;. The AIC number essentially imposes a penalty 
equal to the number of model parameters kj on the value of the log-likelihood at the 
maximum. The model favoured is the one for which the penalized log-likelihood 
ln Lj (ê j; X) — kj is largest. There are alternatives to the AIC, such as the Bayesian 
information criterion (BIC) of Schwartz, which impose different penalties for the 
number of parameters. See Burnham and Anderson (2002) for more about model 
comparison using these criteria. 
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relation to Value-at-Risk, 44 
scaling, 53 
shortfall-to-quantile ratio, 47, 283 
exponential distribution 
in MDA of Gumbel, 267 
lack-of-memory property, 277 
exponentially weighted moving-average, 
see EWMA method 
extremal index, 270 
extreme value copulas, 311 
copula domain of attraction, 314 
dependence function of, 312 
examples of 
t-EV copula, 316 
Galambos, 313 
Gumbel, 312 
in models of multivariate threshold 
exceedances, 319 
Pickands representation of, 312 
tail dependence of, 315 
extreme value theory 
clustering of extremes, 121, 270, 303 
conditional EVT, 57, 291 
extreme value distribution 
multivariate, 311 
univariate, see GEV distribution 
for operational loss data, 469 
maxima, see maxima 
motivation for, 20, 121 
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multivariate 
maxima, see maxima 
threshold exceedances, see 
exceedances of thresholds 

POT model, 301 

threshold exceedances, see 
exceedances of thresholds 


F distribution, 92, 496 
factor copula models (credit), 444 
default contagion and, see default 
contagion 
examples of 
Li’s model, 445 
with Archimedean copulas, 446 
with Gauss copula, 445 
mixture representation, 444 
simulation of, 444 
factor models, 103 
latent factors, 105 
macroeconomic, fundamental and 
statistical, 114 
multivariate GARCH and, 179 
observed factors, 105 
principal components, see principal 
component analysis 
regression analysis of, 106 
statistical factor analysis, 105 
Feynman-—Kac formula, 422 
filtration, 393 
financial mathematics 
quantitative risk management and, 21 
textbooks, 342 
firm-value models, 328, 331 
endogenous default barrier, 335 
first-passage-time models, 335 
with incomplete accounting 
information, 330 
first-to-default swap, 438, 451 
Fisher—Tippett-Gnedenko Theorem, 266 
Fréchet 
bounds, 188, 200 
distribution, 265 
MDA of, 267, 268, 293 
problems, 248 
frailty, 444 
Frank copula, 220 


Galambos copula, 313 
as EV limit for Clayton survival 
copula, 315 
as extreme value copula, 313 
limiting upper threshold copula of, 
324 
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gamma distribution, 75, 496 
in MDA of Gumbel, 294 
gamma of option, 30 
GARCH models, 145 
combined with ARMA, 148 
estimation of 
using ML, 150 
using QML, 152, 153 
extremal index of, 271 
IGARCH, 148 
kurtosis of, 146 
multivariate, see multivariate 
GARCH models 
parallels with ARMA, 146, 147 
residual analysis, 154 
stationarity of, 145, 147 
tail behaviour of, 296 
threshold GARCH, 150 
use in risk measurement, 57, 162 
volatility forecasting with, 158 
with leverage, 149 
Gauss copula, 190 
asymptotic independence of, 211 
credit risk and, 347, 445 
estimation of 
using ML, 234 
using Spearman’s rho, 230 
joint quantile exceedance 
probabilities, 212 
rank correlations for, 215 
simulation of, 193 
Gaussian distribution, see normal 
distribution 
generalized extreme value distribution, 
see GEV distribution 
generalized hyperbolic distributions, 78 
elliptically symmetric case, 75 
EM estimation of, 81 
special cases 
hyperbolic, 80 
NIG, 80 
skewed t, 80 
variance gamma, 80 
variance—covariance method and, 79 
generalized inverse, 39, 185, 494 
generalized inverse Gaussian 
distribution, see GIG distribution 
generalized linear mixed models, see 
GLMMs 
generalized Pareto distribution, see GPD 
generalized scenarios, 37, 244 
geometric Brownian motion, see 
Black-Scholes model 
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GEV distribution, 265 
as limit for maxima, 265 
estimation using ML, 271 
GIG distribution, 75, 497 
in MDA of Gumbel, 294 
GLMMs, 379 
estimation of, 380 
Bayesian inference, 380 
mixture models in credit risk as, 377 
Gnedenko’s Theorem, 268, 269 
GPD, 275 
as limit for excess distribution, 277 
likelihood inference, 278 
tail model based on, 282 
confidence intervals for, 283 
estimation of ES, 283 
estimation of VaR, 283 
Greeks, 30 
Greenspan, Alan, 15, 20, 387 
Gumbel 
copula, 192, 220 
as extreme value copula, 312 
Kendall’s tau for, 222 
upper tail dependence of, 209 
distribution, 265 
MDA of, 267, 294 


Hoffding’s lemma, 203 
Hawkes process, see self-exciting 
processes 
hazard rate, 393 
cumulative hazard function, 393 
with additional information, 395 
Hill estimator, 286 
Hill plot, 287 
tail estimator based on, 289 
comparison with GPD approach, 
289 
historical-simulation method, 50, 57 
conditional version, 52, 57, 59 
critique of, 51 
empirical risk measure estimation in, 
51 
extensions of, 51 
hyperbolic distribution, 80 


IGARCH (integrated GARCH), 148 
immunization of bond portfolio, 31 
importance sampling, 367 

application to Bernoulli mixture 

models, 370 

density, 368 

exponential tilting and, 369 

for general probability spaces, 370 
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incomplete markets, 413 
inhomogeneous Poisson process, 487 
time changes and, 489 
insurance analytics, 471 
literature on, 491 
the case for, 471 
inverse gamma distribution, 497 
in MDA of Fréchet, 294 


Karamata’s Theorem, 495 
Kendall’s tau, 97, 206 
calculation of 
for Archimedean copulas, 221 
for Gaussian and ¢ copulas, 215, 
217 
estimation of t copula and, 231 
sample estimate of, 229 
KMV model, 336 
distance to default (DD), 337 
equivalent Bernoulli mixture model, 
361 
expected default frequency (EDF), 
337 
portfolio version of, 347 
kurtosis, 69, 70, 121 


lead-lag effect, 165 
LGD (loss given default), 344 
linearization of loss, 27 
in variance—covariance method, 48 
linearized loss operator, 27 
liquidity risk, 3, 41 
Ljung—Box test, 119, 134 
log-returns, 29 
generalized hyperbolic models for, 84 
longer-interval returns, 122 
non-normality of, 70 
stylized facts of, 117 
loss distribution, 26 
conditional, 26, 28 
issue of holding period, 27 
operational, 467 
P&L, 26 
risk measures based on, 35, 37, 43 
unconditional, 26, 28 
loss operator, 27 
LT-Archimedean copulas, 224 
p-factor, 227 
one-factor, 224 
LTCM case, 7, 20, 23, 41 


mapping of risks, 26 
examples, 29 
bond portfolio, 30 
currency forwards, 32 
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European call option, 29 
portfolio of risky loans, 32 
stock portfolio, 29 
time units convention, 27 
market completeness, 406 
market risk, 2 
regulatory treatment of, 8, 9, 11 
standard methods for, 48, 55 
use of time series methods, 160 
Markov chains (in continuous time), 457 
Markowitz portfolio optimization, 6 
martingale, 394 
default intensity, 400 
for conditionally independent 
defaults, 434 
general characterization, 448, 449 
hazard rate and, 400, 450 
in factor copula models, 454 
modelling, 408 
application to CDS spreads, 409 
martingale-difference sequence, 127, 166 
maxima, 264 
block maxima method, 271 
estimating return levels and 
periods, 273 
Fisher—Tippett-Gnedenko Theorem, 
266 
GEV distribution as limit for, 265 
maximum domain of attraction, 266, 
267 
of Fréchet, 267, 268, 286, 293 
of Gumbel, 267, 269, 294 
of Weibull, 269 
models for minima, 267 
multivariate, 314 
multivariate, 311 
block maxima method, 317 
maximum domain of attraction, 
311 
of stationary time series, 270 
maximum domain of attraction, see 
maxima 
maximum likelihood inference, 499 
MDA (maximum domain of attraction), 
see maxima 
mean excess 
function, 276 
plot, 279 
Merton model, 331 
extensions, 335, 342 
modelling of default, 331 
multivariate version, 342 
pricing of equity and debt, 332 
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meta distributions, 192 
meta-t distribution, 193 
meta-Gaussian distribution, 193 
MGARCH model, see multivariate 
GARCH models 
mixed Poisson 
distributions, 482 
example of negative binomial, 483 
process, 490 
mixture models (credit), 328, 352 
Bernoulli mixture models, see 
Bernoulli mixture models 
Poisson mixture models, 353, 379 
CreditRisk+, see CreditRisk+ 
ML, see maximum likelihood inference 
model risk, 3 
in credit risk models, 350, 364 
models with interacting intensities 
(credit), 456 
examples of default intensities, 459 
Markov chains and, 457 
Modigliani—Miller Theorem, 17 
Monte Carlo method, 52 
application to credit risk models, 367 
critique of, 52 
importance sampling, see importance 
sampling 
rare-event simulation, 367 
multivariate distribution, 62 
elliptical, see elliptical distributions 
generalized hyperbolic, see 
generalized hyperbolic 
distributions 
normal, see normal distribution 
normal mixture, see normal mixture 
distributions 
t, see t distribution 
multivariate extreme value theory, see 
extreme value theory 
multivariate GARCH models, 170 
estimation using ML, 178 
examples of 
BEKK, 177 
CCC, 172 
DCC, 174 
DVEC, 175 
orthogonal GARCH, 181 
PC-GARCH, 181 
pure diagonal, 173 
VEC, 175 
general structure, 170 
use in risk measurement, 57, 182 
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negative binomial distribution, 498 
as mixed Poisson distribution, 483 
in Panjer class, 480 
NIG distribution, 80 
normal distribution 
expected shortfall, 45 
multivariate, 66 
copula of, 190 
in variance—covariance method, 
49 
properties of, 67 
simulation of, 66 
spherical case, 90 
testing for, 69 
testing for, 68 
unsuitability for log-returns, 70 
Value-at-Risk, 39 
normal inverse Gaussian distribution, 80 
normal mixture distributions, 73 
copulas of, 210 
examples of 
t distribution, 75 
generalized hyperbolic, 75, 78 
two point mixture, 75 
mean-variance mixtures, 77 
tail behaviour of, 295 
variance mixtures, 73 
simulation of, 76 
spherical case, 90 
notional-amount approach, 34 


operational risk, 3 
approaches to modelling, 465, 466 
advanced measurement (AM), 
466 
basic indicator (BI), 465 
standardized (S), 465 
data issues, 468 
regulatory treatment of, 12, 463 
operational time, 399 
orthogonal GARCH model, 181 


P&L distribution, 26 
Panjer 
distribution class, 480 
recursion, 480 
Pareto distribution, 196, 498 
in MDA of Fréchet, 267, 268 
PCA, see principal component analysis 
peaks-over-threshold model, see POT 
model 
physical measure, 401 
Pickands—Balkema—de Haan Theorem, 
277 
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point processes, 298, 299 
counting processes, 485 
point process of exceedances, 300 
Poisson point process, 299 
self-exciting processes, 306 
Poisson mixture distributions, see mixed 
Poisson distributions 
Poisson process, 484, 485 
as counting process, 485 
as limit for exceedance process, 300 
characterizations of, 486 
in POT model, 301 
inhomogeneous, 487 
example of records, 488 
time changes and, 489 
multivariate version of, 487 
Poisson cluster process, 303 
portmanteau tests, 134 
POT model, 301 
as two-dimensional Poisson process, 
301 
estimation using ML, 302 
self-exciting version of, 307 
unsuitability for financial time series, 
303 
principal component 
analysis, 109 
link to factor models, 111 
multivariate GARCH and, 181 
GARCH model, 181 
probability transform, 185 
profile likelihood, 501 
quantile estimation and, 284 
profit-and-loss distribution, 26 
pseudo-maximum likelihood copula 
estimation, 234 


QIS, see Quantitative Impact Studies 
QML, see quasi-maximum likelihood 
inference 
QQplot, 68 
quantile 
function, 39, 186, 495 
transform, 185 
Quantitative Impact Studies, 9 
operational risk and, 464, 468 
quasi-maximum likelihood inference, 
152, 153, 279 


radial symmetry, 196 

rank correlation, 206 
Kendall’s, see Kendall’s tau 
properties of, 207 
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sample rank correlations, 229 
Spearman’s, see Spearman’s rho 
recovery modelling 
for corporate bonds, 414 
model-risk issues, 365 
recovery of market value, 419 
reduced-form models (credit), 328, 385 
for portfolio credit risk, 429 
limitations, 420 
regularly varying function, 268 
regulation, 8, 10, 43 
Basel II, see Basel II 
Solvency 2, see Solvency 2 
regulatory capital, 43 
renewal process, 487 
return 
level, 273 
period, 273 
rho of option, 30 
risk, 1 
credit risk, see credit risk 
history of, 5 
market risk, see market risk 
operational risk, see operational risk 
overview of risk types, 2 
randomness and, 1 
reasons for managing, 15 
risk factors, 26 
mapping, 26 
time units convention, 27 
risk measurement, 4, 34, 55 
approaches 
factor-sensitivity measures, 35 
loss-distribution-based risk 
measures, 35 
notional-amount approach, 34 
scenario-based risk measures, 36 
conditional versus unconditional, 28, 
57 
standard market risk methods, 48 
risk measures 
backtesting of, 55 
based on loss distributions, 35 
calculating bounds for, 249 
coherence of, see coherent risk 
measures 
convex, 241, 247 
examples of 
CVaR, 47 
drawdowns, 47 
expected shortfall, see expected 
shortfall 
Fischer premium principle, 244 
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partial moments, 44 
semivariance, 44 
tail conditional expectation, 47 
Value-at-Risk, see Value-at-Risk 
variance, 43 
worst conditional expectation, 47 
scaling, 53 
scenario-based, 36, 244 
spectral, 251 
uses of, 34 
risk-factor changes, 26 
examples of, 56 
stylized facts of, 117 
risk-neutral 
measure, 333, 401 
pricing rule, 333, 406 
RiskMetrics 
documentation, 33 
the birth of VaR and, 9 
treatment of bonds, 32 
robust statistics, 96 
RORAC (return on risk-adjusted 
capital), 256 


sample mean excess plot, 279 
scaling of risk measures, 53 
Monte Carlo approach, 54 
square-root-of-time, 54 
scenario-based risk measures, 36 
coherence and, 244 
self-exciting processes, 306, 488 
self-exciting POT model, 307 
predictable marks, 308 
risk measures for, 309 
unpredictable marks, 308 
semivariance, 44 
shortfall contributions, 260 
skewed ż distribution, 80 
skewness, 69, 70, 121 
Sklar’s Theorem, 186 
slowly varying function, 268, 495 
Solvency 2, 13 
Spearman’s rho, 207 
for Gauss copula, 215 
use in estimation of, 230 
sample estimate of, 229 
spherical distributions, 89 
tail behaviour of, 295 
square-root processes, see CIR model 
square-root-of-time rule, 54 
stable distribution, 224, 498 
stationarity, 126, 165 
strict white noise, 127, 166 
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structural models, see firm-value 
models 
Student ż distribution, see t distribution 
stylized facts 
of financial time series, 117 
multivariate version, 123 
of operational risk data, 468 
survival copulas, 195 


t copula, 191 
estimation of 
using Kendall’s tau, 231 
using ML, 235 
grouped, 218 
joint quantile exceedance 
probabilities, 212 
Kendall’s tau for, 217 
simulation of, 193 
skewed, 217 
tail dependence of, 211 
t distribution 
expected shortfall, 45 
in MDA of Fréchet, 293 
multivariate, 75 
copula of, 191 
skewed version, 80 
Value-at-Risk, 39 
tail 
dependence, 194, 208 
in ¢ copula, 211 
in Archimedean copulas, 222 
in elliptical distributions, 297 
in Gumbel and Clayton copulas, 
209 
equivalence, 294 
index, 268, 286 
tails of distributions, 293 
compound sums, 484 
mixture distributions, 295 
regularly varying, 268, 293 
threshold copulas, 322 
lower, 322 
limits for, 322 
upper, 323 
limits for, 323 
use in modelling, 325 
threshold exceedances (EVT), see 
exceedances of thresholds 
threshold models (credit), 343 
copulas and, 346 
equivalent Bernoulli mixture models 
and, 359 
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examples of 
in industry, 347 
Li’s model, 347 
using t copula, 348, 361 
using Archimedean copulas, 349, 
361 
using Clayton copula, 362 
using Gauss copula, 347 
using normal mean-variance 
mixture copulas, 348, 360 

model risk in, 350 

type of distribution, 265, 494 


Value-at-Risk, 37 
additivity for comonotonic risks, 250 
as quantile, 38 
backtesting of, 55, 162 
bounds on VaR of portfolio, 250 
calculation of 
for ¢ distribution, 39, 46 
for GPD tail model, 283 
for normal distribution, 39, 46 
capital allocation with, 258 
dangers of portfolio optimization 
with, 246 
non-coherence of, 241 
origins of, 8 
pictorial representation of, 39 
practical issues concerning, 40 
regulatory capital and, 43 
scaling, 53 
shortfall-to-quantile ratio, 47, 283 
VaR, see Value-at-Risk 
VAR (vector AR), 169 
variance—covariance method, 48, 57 
critique of, 49 
extensions, 50 
with generalized hyperbolic 
distribution, 79 
variance-gamma distribution, 80 
VARMA (vector ARMA), 168 
vector GARCH (VEC) model, 175 
vega of option, 30 
volatility 
clustering, 117 
forecasting, 158 
with EWMA, 159 
with GARCH, 158 
von Mises distributions, 294 


Weibull distribution, 265 
white noise, 127, 166 


yield of bond, 31 


